Knowledge Mapping the Corpus: Part Two
L. Van Warren
Warren Design Vision
Preamble
In part one, the definitions and basic concepts of fact extraction and knowledge mapping were outlined. These are essential elements of a knowledge management infrastructure. These were developed in the context of the biotechnology world. The discussion will now be extended and applied to the areas of missing person investigations, antiterrorism, legal proceedings, forensic investigations, library science and medicine. Finally these techniques will be discussed within the context of the schematic world. The techniques of fact extraction and knowledge mapping, described in part one can be applied to these kind of problems to increase the probability of a successful outcome and to finish the job of distilling meaning from the mountains of existing data..
Introduction
Imagine the situation that typically ensues when a person goes missing. First there is a waiting period during which critical time is wasted, usually 24 hours. Then there is a data solicitation phase which takes the form of a plea to the media. Following this there is a data gathering phase in which friends, witnesses, investigators and the public provide unfiltered content with varying degrees of relevance and certainty. During this period there is a data overload condition in which the knowledge management infrastructure necessary to distill meaning from the data does not exist. In the data overload condition an avalanche of phone calls, emails, faxes, roadmaps and recorded media accounts appear. After this a filtering strategy is developed which enables investigators, family and friends to prioritize factual data by assigning to it a relevance and certainty. Most serious crimes are perpetrated by someone we know. If the investigators, family members or friends have contributed to the disappearance, they may be in a position to suppress vital factual information. Sometimes the person, alive or dead is recovered, sometimes not.
The Problem
The most pressing problem in these situations is that no knowledge management infrastructure (KMI) exists. A KMI provides knowledge management in an objective, independent and trustworthy way that keeps up with the incoming data stream. Backlogged data that incessantly spews without being processed creates an accumulation of informational litter, an unsightly data junkyard problem that is not consistent with virtual urban planning and a pristine infoscape.
Stacking the Deck in
Your Favor
Let us apply what we have discovered to the missing person problem. If you want to improve the chances of recovering a missing person you must:
1) Act immediately.
2) Assemble an expert staff who will convert all phone calls, emails, faxes and phone calls into plain text format. These converted documents are called plain text messages. All text messages are stamped with date and time, and the name and address of originator if available.
3) A real time investigator assigns a relevance (priority) and certainty to each text message. A message can be certain but irrelevant. Priority insures that the most important messages are processed first.
4) The messages are fact extracted using the method described previously.
5) The extracted facts are knowledge mapped.
6) The knowledge map is viewed and interrogated by the investigators.
7) The knowledge map is filtered and viewed again.
Steps 5 through 7 are repeated as more facts come in. Overfiltering can eliminate important leads. Underfiltering leads to information overload. A real time investigator must be able to set the filtering threshold adaptively and interactively. Cognitive overload occurs when there is pressure to follow more than one line of reasoning simultaneously. Because of cognitive overload it is best to have real time investigators work in teams of two or three.
Knowledge Management
in Antiterrorism
Consider the caves of
The emergence of knowledge management tools such as fact extraction, knowledge mapping, and knowledge filtering can increase the probability of positive outcomes in these situations. They cannot guarantee a positive outcome, but because of the power of fact chaining they can increase the likelihood of successful resolution.
Remember Only This
The informational structure of the missing person problem, the disease
problem, the terrorism problem and the schematic problem are all the same.
Knowledge Mapping for
Legal Proceedings
For the legal profession there are two important proof of concept demonstrations:
Demonstration A is a fact map for a whodunit. A whodunit consists of a central crime fact (Johnny is shot) and the facts that relate to the central event. The terms event and fact are the synonymous. This consists of facts about the crime scene, the testimony of investigators and witnesses, some of which may be unreliable. The testimony may be tenuous or even in contradiction, in which case irrelevant or false facts have been included. This could be because the witnesses are criminals themselves, are protecting someone, are in collusion or conspiracy, or have faded or altered memories of the events. Given the presence of significant uncertainty and the number of sources of error, it is worth taking a moment to wonder how many crimes are ever solved correctly. How often does the right person face the judicial system? How do we find the right person? How do we find who shot Johnny?
The testimony of investigators and witnesses can be fact extracted and color coded by individual and by type of information. For example, forensic information might be colored red to imply a blood or DNA trail.
Demonstration B shows the use of knowledge mapping for building an effective appeal as in constructing a convincing legal argument using existing case precedent, which has been fact extracted and knowledge mapped. In this demonstration one has a knowledge map of a current case and one has the knowledge map for precedent cases. These knowledge maps are filtered, compared and contrasted. Similar chains of reasoning appearing in different cases or visual analogies are extracted compared and ranked for the degree of similarity, accuracy and significance. An idiomatic expression is set of one or more linked facts that appear in two or more independent chains of reasoning.
Experiment
One might imagine a fifty page legal pad sitting adjacent to a four by six foot printed knowledge map. The legal pad would contain the plain text of the case while the knowledge map would depict these facts in graphical format.
One might imagine two fictitiously identical legal teams. One would have only the legal pad, the other only the knowledge map. We would then ask the question, what advantages might one team have over the other? Consider the following hypothetical expectations:
One would expect the team with the knowledge map to more quickly master the central facts of the case and the chains of reasoning emanating from that central fact, and the locations of uncertainty in those chains of reasoning.
One would expect the team with the legal pad to comprehend the continuity of the storyline with the emphases that were implied by the sequentiality choices made by the writer of the case.
The team with the knowledge map would quickly find the locations of uncertainty which would form the basis for further interviews and cross examination. Evaluating facts in the light of temporal, spatial or extrafactual conditions provides a deterministic method for thorough interview and complete cross examination. The transcripts of these interviews would then be fact extracted and knowledge mapped, to further increase understanding. The greatest leverage is obtained when people and computers divide the work in such a way that each does the work that it is most effective at. A human domain expert is essential in the fact extraction and knowledge mapping to account for the nuances of language.
Limitations of the
Experiment
There are several limitations in this experiment:
1) A printed knowledge map is static and cannot be filtered for relevant arguments, or thresholded for certain ones. The layout topology of a printed knowledge map cannot be changed to produce more optimal points of view.
2) As the complexity of a 2-D knowledge map increases, it becomes visually complex and ultimately intractable. As the number of facts increase, edge crossings begin to appear with increasing frequency. These edge crossings introduce clutter and visual ambiguity. The consequence is visual spaghetti that does not clarify understanding without study. Study is sequential and thus suffers from the curse of same discussed above. One can wonder whether the learning process itself is the interplay between the sequential version of knowledge that appears in spoken language and printed text and the non-sequential version of knowledge that appears in visual imagery and multi-track sounds.
Extensions of the
Experiment
On the other hand:
1) An interactive knowledge map is dynamic and can easily be filtered, thresholded and viewed from multiple points of view in multiple styles of layout.
2) A 3D knowledge map can be immersive and visually simpler to comprehend. As the facts increase, no edge crossings appear because there is more “room” in 3D. None are necessary. Additional facts fill the viewing volume a few points at a time.
A courteous customer will tolerate visual spaghetti, and possibly even receive it with enthusiasm because of the sense of wonder and complexity. But this will wane quickly in time. The result is that only valid knowledge management utilities will stand the test of time.
The Impact of
Knowledge Management Tools
In the context of a larger courtroom case, press accounts
would also be extracted, mapped and filtered to remove r
This inclusion of powerful knowledge management tools and external visual organizers could change the way the legal process works. The idea of exclusion of evidence could evolve into changing the certainty of evidence based on the source. This could further lead to the automation of some aspects of courtroom proceedings. For example a computational verdict could be rendered and compared with a traditional human computed verdict. This would be most interesting.
The Curse of
Sequentiality
When composing and comprehending textual information, we
typically understand facts in some linear order of presentation. This linear
ordering tends to fade over time making facts presented late in the order have
a false prominence compared to facts presented earlier. We rehearse stories to
build internal images that equalize the certainty of facts and assigns them an
appropriate contextual priority. In
Knowledge maps are reflective of the way human memory is organized. Repeated facts, especially those appearing in more than one sensory modality tend to accumulate increased action potential relative to unrehearsed ones. In this sense a known fact opaque and immediately available with an action potential of one. A less certain fact is presented tentatively with an action potential of less than one.
Normalizing Certainty
When facts are extracted from raw literature they are
enumerated with some nominal certainty, based on the reliability of the source.
Facts are then sorted by entity or relation, using standard nomenclature. When
duplicate facts arrive from different sources, the certainty is increased and
duplicates are purged. Many rules can be applied for normalizing certainty, but
in any case, the certainty of a fact must not exceed one. One may uses maximum
likelihood estimators, standard deviation and statistical quality control
techniques that fold in the number of votes necessary to set the certainty of a
fact at a given confidence level. Appearance of contradictory facts r
When pooling facts from different sources, or pooling facts
extracted by different domain experts, the certainty of the facts from each
pool is the product of the reliability of the source and the reliability of the
domain expert, with the uncertainty in the certainty computed as the
error rate of the source plus the error rate of the expert. If these
error rates are unknown, they flagged as such. A certified or curated argument
is one in which these four numbers are known.
Knowledge Management
in Library Science
Old books can be data mined for facts. These facts were thought to be true when the books were written. Some facts remain true, some are replaced with more accurate facts. Watching the evolution of knowledge, culture and dogma could be very informative i.e. history in action. Discovering a breadth of related facts, not only through various geographic regions but through time could yield benefits and insights as well. A knowledge map of a family heritage line is called a geneaology. Imagine if a comprehensive genealogy could be assembled from all the books and records in the world such that if you took any two people, you could determine their closest common ancestor. This makes the preservation of historical records and documents a very important activity.
Knowledge Management
in Medicine
These same phases and conditions are repeated when a person becomes ill with a serious disease. The progression of some diseases is a consequence of the fact that no personalized knowledge management structure is in place for the specifics of that disease. Cancer is a good example of this. Gene chips can provide information on those genes that, when broken, enable cellular malfunction of uncontrolled growth called cancer. It is believed that between 200 and 2000 genes are involved in breast cancer. Knowing exactly which genes are broken, and in what way will enable personal and specific regimens of therapy to be developed and the impact of those regimens to be followed over time.
Minding the
Schematics
Consider the following scenario that frequently ensues when a technical artifact such as an appliance, a washer, dryer, air conditioner, heater, TV, VCR or computer breaks:
A technician unrolls a 2D schematic to determine which components are most likely at fault. In other words the technician searches for the schematic for the component - the entity in a putative fact - causing the failure.
During this process the technician establishes a correspondence between the flattened artificial schematic and the actual three dimensional appliance. It may be that a considerable amount of what we call learning occurs in the translation between the schematic and the real object. The schematic is a knowledge map of facts called components.

Establishing this correspondence improves with practice but
is tedious to learn. The rate at which appliances change make this approach
even less effective. Wouldn’t it be
preferable to work instead with an exploded 3D schematic that enabled direct
inspection and simulation of the appliance involved? Who might benefit from
this technology? The man pictured is in