Knowledge Mapping the Corpus: Part Two

 

L. Van Warren

Warren Design Vision

Feb 26, 2002

 

 

Preamble

In part one, the definitions and basic concepts of fact extraction and knowledge mapping were outlined. These are essential elements of a knowledge management infrastructure. These were developed in the context of the biotechnology world. The discussion will now be extended and applied to the areas of missing person investigations, antiterrorism, legal proceedings, forensic investigations, library science and medicine. Finally these techniques will be discussed within the context of the schematic world. The techniques of fact extraction and knowledge mapping, described in part one can be applied to these kind of problems to increase the probability of a successful outcome and to finish the job of distilling meaning from the mountains of existing data..

 

 

Introduction

Imagine the situation that typically ensues when a person goes missing. First there is a waiting period during which critical time is wasted, usually 24 hours.  Then there is a data solicitation phase which takes the form of a plea to the media. Following this there is a data gathering phase in which friends, witnesses, investigators and the public provide unfiltered content with varying degrees of relevance and certainty.  During this period there is a data overload condition in which the knowledge management infrastructure necessary to distill meaning from the data does not exist. In the data overload condition an avalanche of phone calls, emails, faxes, roadmaps and recorded media accounts appear. After this a filtering strategy is developed which enables investigators, family and friends to prioritize factual data by assigning to it a relevance and certainty. Most serious crimes are perpetrated by someone we know. If the investigators, family members or friends have contributed to the disappearance, they may be in a position to suppress vital factual information. Sometimes the person, alive or dead is recovered, sometimes not.

 

The Problem

The most pressing problem in these situations is that no knowledge management infrastructure (KMI) exists. A KMI provides knowledge management in an objective, independent and trustworthy way that keeps up with the incoming data stream. Backlogged data that incessantly spews without being processed creates an accumulation of informational litter, an unsightly data junkyard problem that is not consistent with virtual urban planning and a pristine infoscape.


 

Stacking the Deck in Your Favor

Let us apply what we have discovered to the missing person problem. If you want to improve the chances of recovering a missing person you must:

 

1)      Act immediately.

2)      Assemble an expert staff who will convert all phone calls, emails, faxes and phone calls into plain text format. These converted documents are called plain text messages.  All text messages are stamped with date and time, and the name and address of originator if available.

3)      A real time investigator assigns a relevance (priority) and certainty to each text message. A message can be certain but irrelevant. Priority insures that the most important messages are processed first.

4)      The messages are fact extracted using the method described previously.

5)      The extracted facts are knowledge mapped.

6)      The knowledge map is viewed and interrogated by the investigators.

7)      The knowledge map is filtered and viewed again.

 

Steps 5 through 7 are repeated as more facts come in.  Overfiltering can eliminate important leads. Underfiltering leads to information overload. A real time investigator must be able to set the filtering threshold adaptively and interactively. Cognitive overload occurs when there is pressure to follow more than one line of reasoning simultaneously. Because of cognitive overload it is best to have real time investigators work in teams of two or three.

 

Knowledge Management in Antiterrorism

Consider the caves of Tora Bora, Afghanistan. Wandering non-systematically from cave to cave saying, “there is no one here” neglects the most precious information in each cave. That resource is information. In a systematic search each cave is identified, explored, marked and sealed. If the contents of those caves are photographed and the contents logged, one can extract the facts and create knowledge maps based on the artifacts recovered from each cave. Lines of reasoning that connect large concentrations of artifacts imply resource and supply chains. Supply chains indicate footprint lines of resources allocation that were necessary to sustain terrorist operations. These trends cannot be seen without graphing them. Spent ammunition cases and empty boxes provide vital and meaningful data when plotted in this manner.

 

The emergence of knowledge management tools such as fact extraction, knowledge mapping, and knowledge filtering can increase the probability of positive outcomes in these situations.  They cannot guarantee a positive outcome, but because of the power of fact chaining they can increase the likelihood of successful resolution.

 

Remember Only This

The informational structure of the missing person problem, the disease problem, the terrorism problem and the schematic problem are all the same.

 

Knowledge Mapping for Legal Proceedings

For the legal profession there are two important proof of concept demonstrations:

 

Demonstration A is a fact map for a whodunit. A whodunit consists of a central crime fact (Johnny is shot) and the facts that relate to the central event.  The terms event and fact are the synonymous. This consists of facts about the crime scene, the testimony of investigators and witnesses, some of which may be unreliable. The testimony may be tenuous or even in contradiction, in which case irrelevant or false facts have been included. This could be because the witnesses are criminals themselves, are protecting someone, are in collusion or conspiracy, or have faded or altered memories of the events. Given the presence of significant uncertainty and the number of sources of error, it is worth taking a moment to wonder how many crimes are ever solved correctly. How often does the right person face the judicial system? How do we find the right person? How do we find who shot Johnny?

 

The testimony of investigators and witnesses can be fact extracted and color coded by individual and by type of information. For example, forensic information might be colored red to imply a blood or DNA trail.

 

Demonstration B shows the use of knowledge mapping for building an effective appeal as in constructing a convincing legal argument using existing case precedent, which has been fact extracted and knowledge mapped. In this demonstration one has a knowledge map of a current case and one has the knowledge map for precedent cases. These knowledge maps are filtered, compared and contrasted. Similar chains of reasoning appearing in different cases or visual analogies are extracted compared and ranked for the degree of similarity, accuracy and significance.  An idiomatic expression is set of one or more linked facts that appear in two or more independent chains of reasoning.

 

 

Experiment

One might imagine a fifty page legal pad sitting adjacent to a four by six foot printed knowledge map. The legal pad would contain the plain text of the case while the knowledge map would depict these facts in graphical format.

 

One might imagine two fictitiously identical legal teams. One would have only the legal pad, the other only the knowledge map. We would then ask the question, what advantages might one team have over the other? Consider the following hypothetical expectations:

 

One would expect the team with the knowledge map to more quickly master the central facts of the case and the chains of reasoning emanating from that central fact, and the locations of uncertainty in those chains of reasoning.

 

One would expect the team with the legal pad to comprehend the continuity of the storyline with the emphases that were implied by the sequentiality choices made by the writer of the case.

 

The team with the knowledge map would quickly find the locations of uncertainty which would form the basis for further interviews and cross examination. Evaluating facts in the light of temporal, spatial or extrafactual conditions provides a deterministic method for thorough interview and complete cross examination. The transcripts of these interviews would then be fact extracted and knowledge mapped, to further increase understanding. The greatest leverage is obtained when people and computers divide the work in such a way that each does the work that it is most effective at. A human domain expert is essential in the fact extraction and knowledge mapping to account for the nuances of language.

 

Limitations of the Experiment

There are several limitations in this experiment:

 

1)      A printed knowledge map is static and cannot be filtered for relevant arguments, or thresholded for certain ones. The layout topology of a printed knowledge map cannot be changed to produce more optimal points of view.

 

2)      As the complexity of a 2-D knowledge map increases, it becomes visually complex and ultimately intractable. As the number of facts increase, edge crossings begin to appear with increasing frequency. These edge crossings introduce clutter and visual ambiguity. The consequence is visual spaghetti that does not clarify understanding without study. Study is sequential and thus suffers from the curse of same discussed above. One can wonder whether the learning process itself is the interplay between the sequential version of knowledge that appears in spoken language and printed text and the non-sequential version of knowledge that appears in visual imagery and multi-track sounds.

 

Extensions of the Experiment

On the other hand:

 

1)      An interactive knowledge map is dynamic and can easily be filtered, thresholded and viewed from multiple points of view in multiple styles of layout.

2)      A 3D knowledge map can be immersive and visually simpler to comprehend. As the facts increase, no edge crossings appear because there is more “room” in 3D. None are necessary. Additional facts fill the viewing volume a few points at a time.

 

A courteous customer will tolerate visual spaghetti, and possibly even receive it with enthusiasm because of the sense of wonder and complexity. But this will wane quickly in time. The result is that only valid knowledge management utilities will stand the test of time.

 


 

The Impact of Knowledge Management Tools

In the context of a larger courtroom case, press accounts would also be extracted, mapped and filtered to remove redundant facts requoted from the original source. The result would enable new information gathered by the press to be included in the knowledge map which would lead to a more comprehensive understanding.

 

This inclusion of powerful knowledge management tools and external visual organizers could change the way the legal process works. The idea of exclusion of evidence could evolve into changing the certainty of evidence based on the source. This could further lead to the automation of some aspects of courtroom proceedings. For example a computational verdict could be rendered and compared with a traditional human computed verdict. This would be most interesting.

 

The Curse of Sequentiality

When composing and comprehending textual information, we typically understand facts in some linear order of presentation. This linear ordering tends to fade over time making facts presented late in the order have a false prominence compared to facts presented earlier. We rehearse stories to build internal images that equalize the certainty of facts and assigns them an appropriate contextual priority. In education we rehearse arguments and in cross examination we discover contradicting facts to build internal images that flatten facts to an appropriate priority. Even so these rehearsals are often insufficient to communicate the facts in a comprehensive and fair priority because of the curse of sequentiality.

 

Knowledge maps are reflective of the way human memory is organized. Repeated facts, especially those appearing in more than one sensory modality tend to accumulate increased action potential relative to unrehearsed ones. In this sense a known fact opaque and immediately available with an action potential of one. A less certain fact is presented tentatively with an action potential of less than one.

 

Normalizing Certainty

When facts are extracted from raw literature they are enumerated with some nominal certainty, based on the reliability of the source. Facts are then sorted by entity or relation, using standard nomenclature. When duplicate facts arrive from different sources, the certainty is increased and duplicates are purged. Many rules can be applied for normalizing certainty, but in any case, the certainty of a fact must not exceed one. One may uses maximum likelihood estimators, standard deviation and statistical quality control techniques that fold in the number of votes necessary to set the certainty of a fact at a given confidence level. Appearance of contradictory facts reduces the confidence level of all conflicting facts. A single dissenting voice causes the majority to reevaluate its action. This is a potentially large and difficult subject, but it need not reduce the value of knowledge mapping. For simple reasoning, the certainty can be set to 1.0 for all facts and the syllogisms produced by following chains of reasoning can be tested for their validity.

 

When pooling facts from different sources, or pooling facts extracted by different domain experts, the certainty of the facts from each pool is the product of the reliability of the source and the reliability of the domain expert, with the uncertainty in the certainty computed as the error rate of the source plus the error rate of the expert. If these error rates are unknown, they flagged as such. A certified or curated argument is one in which these four numbers are known.

 

Knowledge Management in Library Science

Old books can be data mined for facts. These facts were thought to be true when the books were written. Some facts remain true, some are replaced with more accurate facts. Watching the evolution of knowledge, culture and dogma could be very informative i.e. history in action. Discovering a breadth of related facts, not only through various geographic regions but through time could yield benefits and insights as well. A knowledge map of a family heritage line is called a geneaology. Imagine if a comprehensive genealogy could be assembled from all the books and records in the world such that if you took any two people, you could determine their closest common ancestor. This makes the preservation of historical records and documents a very important activity.

 

Knowledge Management in Medicine

These same phases and conditions are repeated when a person becomes ill with a serious disease.  The progression of some diseases is a consequence of the fact that no personalized knowledge management structure is in place for the specifics of that disease. Cancer is a good example of this. Gene chips can provide information on those genes that, when broken, enable cellular malfunction of uncontrolled growth called cancer. It is believed that between 200 and 2000 genes are involved in breast cancer. Knowing exactly which genes are broken, and in what way will enable personal and specific regimens of therapy to be developed and the impact of those regimens to be followed over time.

 

 

Minding the Schematics

Consider the following scenario that frequently ensues when a technical artifact such as an appliance, a washer, dryer, air conditioner, heater,  TV, VCR or computer breaks:

 

A technician unrolls a 2D schematic to determine which components are most likely at fault. In other words the technician searches for the schematic for the component - the entity in a putative fact - causing the failure.

 

During this process the technician establishes a correspondence between the flattened artificial schematic and the actual three dimensional appliance. It may be that a considerable amount of what we call learning occurs in the translation between the schematic and the real object. The schematic is a knowledge map of facts called components.

 

Establishing this correspondence improves with practice but is tedious to learn. The rate at which appliances change make this approach even less effective.  Wouldn’t it be preferable to work instead with an exploded 3D schematic that enabled direct inspection and simulation of the appliance involved? Who might benefit from this technology? The man pictured is in Zambia, one of the poorest nations of the world with an annual income of $200. He is servicing a 25 year old television. Does he deserve better? I think he does. He at least deserves to be able to find the problem with the TV quickly and fix the problem if it is still possible.