Join Our Forum To download Latest Question papers and Syllabus

Get Money

Earn upto Rs. 9,000 pm checking Emails. Join now!

Enter your email address:

Delivered by FeedBurner


Sunday, November 11, 2007

Artificial intelligence Section C

Expert Systems
Learning Objectives
After reading this unit you should appreciate the following:
Need and Justification of Expert Systems
Knowledge Acquisition
Case Studies
MYCIN
RI
Top
Need and Justification of Expert Systems
This unit describes the basic architecture of knowledge-based systems with emphasis placed on expert systems. Expert systems are recent product of artificial intelligence. They began to emerge as university research systems during the early 1970s. They have now become one of the most important innovations of AI, since they have been shown to be successful commercial products as well as interesting research tools.
Expert systems have been proven to be effective in a number of problem domains, which normally require the kind of intelligence possessed by a human expert. The areas of application are almost endless. Wherever human expertise is needed to solve a problem, expert systems are most likely of the options sought. Application domain includes law, chemistry, biology engineering, manufacturing, aerospace military operations, finance, banking, meteorology, geology, geophysics and more .The list goes on and on.
In this chapter we explore expert system architectures and related building tools. We also look at a few of the more important application areas as well. The material is intended to acquaint the reader with the basic concepts underlying expert system and to provide enough of the fundamentals needed to build basic systems to pursue further studies and conduct research in the area.
An expert system is a set of programs that manipulate encoded knowledge to solve problems in a specialized domain that normally requires human expertise. An expert system’s knowledge is obtained from expert sources and coded in a form suitable for the system to use in inference or reasoning processes. The expert knowledge must be obtained from specialists or other sources of expertise, such as texts, journal articles, and databases. This type of knowledge usually requires much of training and experience in specialized fields such as medicine, geology, system configuration, or engineering design. Once a sufficient body of expert knowledge has been acquired, it must be encoded in some form, into a knowledge base, then tested, and refined continually throughout the life of the system.
Characteristic Features of Expert Systems
Expert systems differ from conventional computer systems in several important ways.
1. Expert systems use knowledge rather than data to control the solution process. “In the knowledge lies the power” is a theme repeatedly followed and supported through this book. Much of the knowledge used is heuristic in nature rather than algorithmic.
2. The knowledge is encoded and maintained as an entity separate from the control program. As such, it is not compiled together with the control program itself. This permits the incremental addition and modification (refinement) of the knowledge base without recompilation of the control programs. Furthermore, it is possible in some cases to use different knowledge bases with the same control programs to produce different types of expert systems. Such systems are known as Expert System Shell, as they may be loaded with different knowledge bases.
3. Expert systems are capable to explain how a particular conclusion was achieved, and why requested information is needed during a consultation. This is important as it gives the user a chance to assess and understand the system’s reasoning ability, thereby improving the user’s confidence in the system.
4. Expert systems use symbolic representations for knowledge (rules, networks, or frames) and perform their inference through symbolic computations that closely resemble manipulations of natural language. (An exception to this is the expert system based on neutral network architectures.)
5. Expert systems often reason with metaknowledge (knowledge about knowledge) also, their own knowledge limits it’s capabilities.
Top
MYCIN
The development of MYCIN began at Stanford University. MYCIN is an expert system, which diagnoses infectious blood diseases and determines a recommended list of therapies for the patient. As part of the Heuristic Programming Project at Stanford, several projects directly related to MYCIN were also completed including a knowledge acquisition component called THEIRESIUS, a tutorial component called GUIDON, and a shell component called EMYCIN (for Essential MYCIN). EMYCIN was used to build other diagnostic systems including PUFF, a diagnostic expert for pulmonary diseases. EMYCIN also became the design model for several commercial expert system building tools.
MYCIN’s performance improved significantly over a period of several year as additional knowledge was added. Tests indicate that MYCIN’ performance now equals or exceeds that of experienced physicians. The initial MYCIN knowledge base contained about only 200 rules. This number was gradually increased to more than 600 rules by the early 1980s. The added rules significantly improved MYCIN’s performance leading to a 65% success record that compared favorably with experienced physicians who demonstrated only an average 60% success rate.

Subgoaling in MYCIN
MYCIN is a heterogeneous program, consisting of many different modules. There is a part of MYCIN's control structure that performs a quasi-diagnostic function. But the goals to be achieved are not physical goals, involving the movement of objects in space, but reasoning goals that involve the establishment of diagnostic hypothesis.
This section concentrates upon the diagnostic module of MYCIN, giving a simplified account of its function, structure and runtime behavior.
Treating blood infections
Firstly, we need to give a brief description of MYCIN's domain: treatment of blood infections. This description pre-supposes no specialized medical knowledge on the part of the reader. But, as with any expert system, having some understanding of the domain is crucial to understand what the program does.
An 'anti-microbial agent' is any drug designed to kill bacteria or arrest their growth. Some agents are too toxic for therapeutic purposes, and there is no single agent effective against all bacteria. The selection of therapy for bacterial infection can be viewed as a four-part decision process:
Deciding if the patient has a significant infection;
Determining the (possible) organism(s) involved;
Selecting a set of drugs that might be appropriate;
Choosing the most appropriate drug or combination of drugs.
Samples taken from the site of infection are sent to a microbiology laboratory for culture, that is, an attempt to grow organisms from the sample in a suitable medium.
Early evidence of growth may allow a report of the morphological or staining characteristics of the organism. However, even if an organism is identified, the range of drugs it is sensitive to, may be unknown or uncertain.
MYCIN is often described as a diagnostic program, but this is not so. Its purpose is to assist a physician who is not an expert in the field of antibiotics with the treatment of blood infections. In doing so, it develops diagnostic hypotheses and weights them, but it need not necessarily choose between them. Work on MYCIN began in 1972 as collaboration between the medical and AI communities at Stanford University. The most complete single account of this work is Short-life (1976).
There have been a number of extensions, revisions and abstractions of MYCIN since 1976, but the basic version has five components shown in the fig. 8.1 which shows the basic pattern of information flow between the modules.
(1} A-knowledgebase which contains factual and judgmental knowledge about the domain.
(2) A dynamic patient database containing information about a particular case.
(3) A consultation program, which asks questions, draws conclusions, and gives advice about a particular case based on the patient data and the static knowledge.
(4) An explanation program, which answers questions and justifies its advice, using static knowledge and a trace of the program’s execution.
(5) A knowledge acquisition program for adding new rules and changing existing ones.
The system consisting of components (l)-(3) is the problem solving pan of MYCIN, which generates hypotheses with respect to the offending organisms, and makes therapy recommendations based on these hypotheses.

Figure 8.1: Organization of MYCIN
MYCIN's knowledge base
MYCIN's knowledge base is organized around a set of rules of the general form
if condition1 and ... and conditionm hold
then draw conclusion1 and... and conclusionn
encoded as data structures of the LISP programming language
Figure 8.2 shows the English translation of a typical MYCIN rule for inferring class of an organism. This translation is provided by the program itself. Such rules are called ORGRULES and they attempt to cover such organisms as streptococcus, pseudomonas, and entero-bacteria.
The rule says that if an isolated organism appears rod-shaped, stains in a certain way, and grows in the presence of oxygen, then it is more likely to be in the class entero-bacteria. The number 0.8 is called the tally of the rule, which says how certain conclusion is given, that the conditions are satisfied. The use of the tally is explained below. Each rule of this kind can be thought of as encoding a piece of human knowledge whose applicability depends only upon the context established by the conditions of the rule.
The conditions of a rule can also be satisfied with varying degrees of certainty, the import of such rules roughly is as follows:
if condition1 holds with certainty x1 ... and conditionm holds with certainty xm
then draw conclusion1 with certainty y1 and... and conclusionn with certainty yn
where the certainty associated with each conclusion is a function of the combined certainties of the conditions and the tally, which is meant to reflect our degree of confidence in the application of the rule.
In summary, a rule is a premise-action pair and such rules are sometimes called ‘productions' for purely historical reasons. Premises are conjunction of conditions, and their certainty is a function of the certainty of these conditions. Conditions are either proposition, which evaluate the truth or falsehood with some degree of certainty, (for example 'the organism is rod-shaped') or disjunctions of such conditions. Actions are either conclusions to be drawn with some appropriate degree of certainty, for example the identity of some organism, or instructions to be carried out, for example compiling a list of therapies.
We will explore the details of how rules are interpreted and scheduled for application in the following sections, but first we must look at MYCIN's other structures for representing medical knowledge.
IF 1) The stain of the organism is gramneg, and
2) The morphology of the organism is rod, and
3) The aerobicity of the organism is aerobic
THEN There is strongly suggestive evidence (.8) that
the class of the organism is entero-bacteria
A MYCIN ORGRULE for drawing the conclusion enterobacteriaaceae
In addition to rules, the knowledge base also stores facts and definitions in various forms:
simple lists, for example the list of all organisms known to the system;
knowledge tables, which contain records of certain clinical parameters and the values they take under various circumstances, for example the morphology (structural shape) of every bacterium known to the system;
a classification system for clinical parameters according to the context in which they apply, for example whether they are attributes of patients or organisms.
Much of the knowledge not contained in the rules resides in the properties associated with the 65 clinical parameters known to MYCIN. For example, shape is an attribute of organisms which can take on various values, such as 'rod' and 'coccus.' Parameters are also assigned properties by the system for its own purposes. The main ones either (i) help to monitor the interaction with the user, or (ii) provide indexes which guides the application of rules.
Patient information is stored in a structure called the context tree, which serves to organize case data. Figure on next page shows a context tree representing a particular patient, PATIENT-1, with three associated cultures (samples, such as blood samples, from which organisms may be isolated) and a recent operative procedure that may need to be taken into account (for example, because drugs were involved, or because the procedure involves particular risks of infection). Associated with cultures are organisms that are suggested by laboratory data, and associated with organisms are drugs that are effective against them.
Imagine that we have the following data stored in a record structure associated with the node for ORGANISM-1:
GRAM = (GRAMNEG 1.0)
MORPH = (ROD .8) (COCCUS .2)
AIR = (AEROBIC .6)
with the following meaning:
the Gram stain of ORGANISM-1 is definitely Gram negative;
ORGANISM-1 has a rod morphology with certainty 0.8 and a coccus morphology with certainty 0.2;
ORGANISM-1 is aerobic (grows in air) with certainty 0.6.

Figure 8.2: A typical MYCIN context tree
Suppose now that the rule of conclusion above is applied. We want to compute the certainty that all three conditions of the rule
IF 1) the stain of the organism is gramneg, and
2) the morphology of the organism is rod, and
3) the aerobicity of the organism is aerobic
THEN there is strongly suggestive evidence (0.8) that the class of the organism is entero-bacteria.
are satisfied by the data. The certainty of the individual conditions is 1.0, 0.8 and 0.6 respectively, and the certainty of their conjunction is taken to be the minimum of their individual certainties, hence 0.6.
The idea behind taking the minimum is that we are only confident in a conjunction of conditions to the extent that we are confident in its least inspiring element. This is rather like saying that a chain is only as strong as its weakest link. By an inverse argument, we argue that our confidence in a disjunction of conditions is as strong as the strongest alternative, that is, we take the maximum. This convention forms part of a style of inexact reasoning called fuzzy logic.
In the case, we draw the conclusion that the class of the organism is entero-bacteria with a degree of certainty equal to
0.6 x 0.8 = 0.48
The 0.6 represents our degree of certainty in the conjoined conditions, while the 0.8 stands for our degree of certainty in the rule application. These degrees of certainty are called certainty factors (CFs). Thus, in the general case,
CF(action) x CF(premise) x CF(rule).
Where we revisit the whole topic of how to represent uncertainty. It turns out that the CF model is not always in agreement with the theory of probability; in other words, it is not always correct from a mathematical point of view. However, the computation of certainty factors is much more tractable than the computation of the right probabilities, and the deviation does not appear to be very great in the MYCIN application.
MYCIN’s control structure
MYCIN has a top-level goal rule which define the whole task of the consultation system, which is paraphrased below:
IF 1) there is an organism which requires therapy and
2) consideration has been given to any other organisms requiring therapy
THEN compile a list of possible therapies, and determine the best one in this list.
A consultation session follows a simple two-step procedure:
• create the patient context as the top node in the context tree;
• attempt to apply the goal rule to this patient context.
Applying the rule involves evaluating its premise, which involves finding out if there is indeed an organism which requires therapy. In order to find this out, it must first find out if there is indeed an organism present which is associated with a significant disease. This information can either be obtained from the user directly, or via some chain of inference based on symptoms and laboratory data provided by the user.
The consultation is essentially a search through a tree of goals. The top goal at the root of the tree is the action part of the goal rule, that is, the recommendation of a drug therapy. Subgoals further down the tree include determining the organism involved and seeing if it is significant. Many of these subgoals have subgoals of their own, such as finding out the stain properties and morphology of an organism. The leaves of the tree are fact goals, such as laboratory data, which cannot be deduced.
A special kind of structure, called an AND/OR tree, is very useful for representing the way in which goals can be expanded into subgoals by a program. The basic idea is that root node of the tree represents the main goal, terminal nodes represent primitive actions that can be carried out, while non-terminal nodes represent subgoals that are susceptible to further analysis. There is a simple correspondence between this kind of analysis and the analysis of rule sets.
Consider the following set of condition-action rules:
if X has BADGE and X has GUN, then X is POLICE
if X has REVOI.VER or X as PISTOL or X has RIFLE, then X has GUN
if X has SHIELD, then X has BADGE
We can represent this rule set in terms of a tree of goals, so long as we maintain the distinction between conjunctions and disjunctions of subgoals. Thus, we draw an arc between the links connecting the nodes BADGE and GUN with the node POLICE, to signify that both subgoals BADGE and GUN must be satisfied in order to satisfy the goal POLICE. However, there is no arc between the links connecting REVOLVER and PISTOL and RIFLE with GUN, because satisfying either of these will satisfy GUN. Subgoals as BADGE can have a single child, SHIELD, signifying that a shield counts as a badge.
The AND/OR tree in Figure 8.3 can be thought of as a way of representing the search space for POLICE, by enumerating the ways in which different operators can be applied in order to establish POLICE as true.

Figure 8.3: Representing a rule set as an AND/OR tree
This kind of control structure is called backward chaining, since the program reasons backward from what it wants to prove towards the facts that it needs, rather than reasoning forward from the facts that it possesses. In MYCIN, goals were achieved by breaking them down into subgoals to which operators could be applied. Searching for a solution by backward reasoning is generally more focused than forward chaining, as we saw earlier, since one only considers potentially relevant facts.
MYCIN's control structure uses an AND/OR tree, and is quite simple as AI programs go;
(1) Each subgoal set up is always a generalized form of the original goal. So, if the subgoal is to prove the proposition that the identity of the organism is E. Coli, then the subgoal actually set up is to determine the identity of the organism. This initiates an exhaustive search on a given topic, which collects all of the available evidence about organisms.
(2) Every rule relevant to the goal is used, unless one of them succeeds with certainty. If more than one rule suggest a conclusion about a parameter, such as the nature of the organism, then their results are combined. If the evidence about a hypothesis falls between -0.2 and +0.2, it is regarded as inconclusive, and the answer is treated as unknown.
(3) If the current subgoal is a leaf node, then attempt to satisfy the goal by asking the user for data. Else set up the subgoal for further inference, and go to (1).
The selection of therapy takes place after this diagnostic process has run its course. It consists of two phases: selecting candidate drugs, and then choosing a preferred drug, or combination of drugs, from this list.
Evidence Combination
In MYCIN, two or more rules might draw conclusions about a parameter with different Weights of evidence. Thus one rule might conclude that the organism is E. Coli with a certainty of 0.8, while another might conclude from other data that it is E. Coli with a certainty of 0.5 or – 0.8. In the case of a certainty less than zero, the evidence is actually against the hypothesis.
Let X and Y be the weights derived from the application of different rules. MYCIN combines these weights using the following formula to yield the single certainty factor.

where |X| denotes the absolute value of X.
One can see what is happening on an intuitive basis. If the two pieces of evidence both confirm (or disconfirm) the hypothesis, then confidence in the hypothesis goes up (or down). If the two pieces of evidence are in conflict, then the denominator dampens the effect.
This formula can be applied more than once, if several rules draw conclusions about the same parameter. It is commutative, so it does not matter in what order weights are combined.
IF the identity of the organism is pseudomonas
THEN I recommend therapy from among the following drugs:
1 CCLISTIN (.98)
2 POLYMYXIN (.96)
3 QENTAMICIN (.96)
4 CARBENICILLIN (.65)
5 SULFISOXAZOLE (.64)

A MYCIN therapy rule
The special goal rule at the top of the AND/OR tree does not lead to a conclusion, but instigates actions, assuming that the conditions in the premise are satisfied. At this point, MYCIN's therapy rules for selecting drug treatments come into play; they contain sensitivities information for the various organisms known to the system. A sample therapy rule is given above.
The numbers associated with the drug are the probabilities that a pseudomonas will be sensitive to the indicated drug according to medical statistics. The preferred drug is selected from the list according to criteria, which attempts to screen for contra-indications of the drug and minimize the number of drugs administered, in addition to maximizing sensitivity. The user can go on asking for alternative therapies until MYCIN runs out of options, so the pronouncements of the program are not definitive.
Applications of Expert System
Since the introduction of these early expert systems, the range and depth of applications has broadened dramatically. Applications can now be found in almost all areas of business and government. They include such areas as
Different types of medical diagnoses (internal medicine, pulmonary diseases, infectious, blood diseases, and so on)
Diagnosis of complex electronic and electromechanical system
Diagnosis of diesel electric locomotion systems
Diagnosis of software development projects.
Planning experiments in biology, chemistry, and molecular genetics
Forecasting crop damage
Identification of chemical compound structures and chemical compounds.
Location of faults in computer and communications systems
Scheduling of customer orders, job shop production operations, computer resources for operating system, and various manufacturing tasks.
Evaluation of loan applicants for lending institutions
Assessment of geologic structures from dip meter logs.
Analysis of structural systems for design or as a result of earthquake damage
The optimal configuration of components to meet given specifications for a complex system (like computers or manufacturing facilities)
Estate planning for minimal taxation and other specified goals.
Stock and bond portfolio selection and management
The design of very large scale integration (VLSI) systems
Numerous military applications ranging battlefield assessment to ocean surveillance.
Numerous applications related to space planning and exploration
Numerous areas of law including civil case evaluation, product liability, assault and battery, and general assistance in locating different law precedents.
Planning curricula for students.
Teaching students specialized tasks (like trouble-shooting equipment faults)
Importance of Expert Systems
The value of expert systems was well established by the early 1980s. A number of successful applications had been completed by then and they proved to be cost effective. An example, which illustrates this point well is the diagnostic system developed by the Campbell Soup Company.
Campbell Soup use large sterilizers or cookers to cook soups and other canned products at eight plants located throughout the country. Some of the larger cookers hold up to 68,000 cane of food for short periods of cooking time. When difficult maintenance problems occur with the cookers, the fault must be found and corrected quickly or the batch of foods being prepared will spoil. Until recently, the company had been depending on a single expert to diagnose and cure the more difficult problems, flying him to the site when necessary. Since this individual will retire in a few years taking his expertise with him, the company decided to develop an expert system to diagnose these difficult problems.
After some months of development with assistance from Texas Instruments, the company developed an expert system, which ran on a PC. The system has about 150 rules in its knowledge base to diagnose the more complex cooker problems. The system has also been used to provide training to new maintenance personnel. Cloning multiple copies for each of the eight locations cost the company only a few pennies per copy. Furthermore, the system cannot retire, and its performance can continue to be improved with the addition of more rules. It has already proven to be a real asset to the company. Similar cases now abound in many diverse organizations.
Top
Representing and Using Domain Knowledge
Expert systems are complex AI programs. However, the most widely used way of representing domain knowledge in expert systems is, as a set of production rules, which are often coupled with a frame system that defines the objects that occur in the rules. Let's look at a few additional examples drawn from some other representative expert systems. All the rules we show are English versions of the actual rules that the systems use. Differences among these rules illustrate some of the important differences in the ways that expert systems operate.
Top
RI
RI (sometimes also called XCON) is a program that configures DEC VAX systems. Its rules look like this:
If: The most current active context is distributing massbus devices, and
There is a single-port disk drive that has not been' assigned to a massbus, and
The number of devices that each massbus should support is known, and
There is a massbus that has been assigned at least
One disk drive and that should support additional disk drives and
The type of cable needed to connect the disk drive to the previous device on the massbus is known
then
Assign the disk drive to the massbus.
Notice that Rl's rules, unlike MYCIN's, contain no numeric measures of certainty. In the task domain with which RI deals, it is possible to state exactly the correct thing to be done in each particular set of circumstances (although it may require a relatively complex set of antecedents to do so). One reason for this is that there exists a good deal of human expertise in this area. Another is that since RI is doing a design task (in contrast to the diagnosis task performed by MYCIN), it is not necessary to consider all possible alternatives; one good one is enough. As a result, probabilistic information is not necessary in RI.
PROSPECTOR is a program that provides advice on mineral exploration. Its rules look like this:
If: Magnetite or pyrite in disseminated or vein let form is present
then (2, -4) there is favourable mineralization and texture for the propylitic stage.
In PROSPECTOR, each rule contains two confidence estimates. The first indicates the extent to which the presence of the evidence described in the condition part of the rule suggests the validity of the rule's conclusion. In the PROSPECTOR rule shown above, the number 2 indicates that the presence of the evidence is mildly encouraging. The second-confidence estimate measures the extent to which the evidence is necessary to the validity of the conclusion, or stated another way, the extent to which the lack of the evidence indicates that the conclusion is not valid. In the example rule shown above, the number -4 indicates that the absence of the evidence is strongly discouraging for the conclusion.
DESIGN ADVISOR is a system that critiques chip designs. Its rules look like:
If The sequential 'level count of ELEMENT is greater than 2, UNLESS the signal of ELEMENT is resetable
then Critique for poor resetability
DEFEAT Poor resetability of ELEMENT
due to Sequential level count of ELEMENT greater than 2
by ELEMENT is directly resetable
The DESIGN ADVISOR gives advice to a chip designer, who can accept or reject the advice. If the advice is rejected, then system can exploit a justification-based truth maintenance system to revise its model of the circuit. The first rule shown here says that an element should be criticized for poor resetability if its sequential level count is greater than two, unless its signal is currently believed to be resetable. Resetability is a fairly common condition, so it is mentioned explicitly in this first rule. But there is also a much less common condition, called direct resetability. The DESIGN ADVISOR does not even bother to consider that condition unless it gets in trouble with its advice. At that point, it can exploit the second of the rules shown above. Specifically, if the chip designer rejects a critique about resetability and if that critique was based on a high level count, then the system will attempt to discover (possibly by asking the designer) whether the element is directly resetable. If it is, then the original rule is defeated and the conclusion withdrawn.

Reasoning with the Knowledge
As these example rules have shown, expert systems exploit many of the representation and reasoning mechanisms that we have discussed. Because these programs are usually, written primarily as rule-based systems, forward chaining, backward chaining, or some combination of the two, is usually used. For example, MYCIN used backward chaining to discover what organisms were present; then it used forward chaining to reason from the organisms to a treatment regime. RI, on the other hand, used forward chaining. As the field of expert systems matures, more systems that exploit other kinds of reasoning mechanisms are being developed. The DESIGN ADVISOR is an example of such a system; in addition to exploiting rules, it makes extensive use of a justification-based truth maintenance system.
Expert System Shells
Initially, each expert system that was built was created from scratch, usually in LISP. But, after several systems had been built this way, it became clear that these systems often had a lot in common. In particular, since the systems were constructed as a set of declarative representations (mostly rules) combined with an interpreter for those representations, it was possible to separate the interpreter from the domain-specific knowledge and thus to create a system that could be used to construct new expert systems by adding new knowledge corresponding to the new problem domain. The resulting interpreters are called shells. One influential example of such a shell is EMYCIN (for Empty MYCIN), which was derived from MYCIN.
There are now several commercially available shells that serve as the basis for many of the expert systems currently being built. These shells provide much greater flexibility in representing knowledge and in reasoning with it than MYCIN did. They typically support rules, frames, truth maintenance systems, and a variety of other reasoning mechanisms.
Early expert system shells provided mechanisms for knowledge representation, reasoning, and explanation. Later, tools for knowledge acquisition were added. Expert system shells needed to do something else as well. They needed to make it easy to integrate expert systems with other kinds of programs. Expert systems cannot operate in a vacuum, any more than their human counterparts can. They need access to corporate databases, and access to them needs to be controlled just as it does for other systems. They are often embedded within larger application programs that use primarily conventional programming techniques. So one of the important features that a shell must provide is an easy-to-use interface between an expert system that is written with the shell and a larger, probably more conventional, programming environment.
Student Activity 8.1
Before reading next section, answer the following questions.
What problems does an expert system solve?
Discuss the characteristics and applications of Expert Systems.
Distinguish between RI and MYCIN.
If your answers are correct, then proceed to the next section.
Top
Explanation
In order for an expert system to be an effective tool, people must be able to interact with it easily. To facilitate this interaction, the expert system must have the following two capabilities in addition to the ability to perform its underlying task:
1. Explain its reasoning. In many of the domains in which expert systems operate, people will not accept results unless they have been convinced of the accuracy of the reasoning process that produced those results. This is particularly true, for example, in medicine, where a doctor must accept ultimate responsibility for a diagnosis, even if that diagnosis was arrived at with considerable help from a program. Thus it is important that the reasoning process used in such programs proceed in understandable steps and that enough meta-knowledge (knowledge about the reasoning process) be available so the explanations of those steps can be generated.
2. Acquire new knowledge and modifications of old knowledge. Since expert systems derive their power from the richness of the knowledge bases they exploit, it is extremely important that those knowledge bases be as complete and as accurate as possible. But often there exists no standard codification of that knowledge; rather it exists only inside the heads of human experts. One way to get this knowledge into a program is through interaction with the human expert. Another way is to have the program learn expert behavior from raw data.
TEIRESIAS was the first program to support explanation and knowledge acquisition. TEIRESIAS served as a front-end for the MYCIN expert system. A fragment of a TEIRESIAS-MYCIN conversation with a user (a doctor) is shown in figure 8.4. The program has asked for a piece of information that it needs in order to continue its reasoning. The doctor wants to know why the program wants the information, and later asks how the program arrived at a conclusion that it claimed it had reached.

Is the patient’s illness with Organism-1 a hospital-acquired infection?
**Why
[i.e., Why it is important to determine whether or not the infection with Organism-1 was acquired while the patient was hospitalized?]
[2.0] This will aid in determining the category of
Organism-1. It has already been established that
[2.1] the gram strain of Organism-1 is gramneg, and
[2.2] the morphology of Organism-1 is rod, and
[2.3] the aerobicity of Organism-1 is facultative Therefore if
[2.4] the infection with Organism-1 was not acquired while the patient was hospitalized, then

There is strongly suggestive evidence (.8) that the category of Organism-1 is entero-bacteria.

[Also: there is weakly suggestive evidence (.1) that the identity of Organism-1 is pseudomonas-aeruginosa] [Rule037]
** How [2.3]
[i.e., How was it established that the aerobicity of Organism-1 is facultative?]
The following were used:
[3.1] Rule027 indicated there is strongly suggestive evidence (.8) that the aerobicity of Organism-1 is facultative.
[3.2] Rule033 indicated there is weakly suggestive evidence (.2) that the aerobicity of Organism-1 is not facultative.

Since this gave a cummulative CF of (.6) for facultative, the evidence indicates that the aerobicity of Organism-1 is facultative.
Figure 8.4: A Portion of a Dialogue with TEIRESIAS
An important premise underlying TEIRESIAS's approach to explanation is that the behavior of a program can be explained simply by referring to a trace of the program's execution. There are ways in which this assumption limits the kinds of explanations that can be produced, but it does minimize the overhead involved in generating each explanation. To understand how TEIRESIAS generates explanations of MYCIN's behavior, we need to know how that behavior is structured.
MYCIN attempts to solve its goal of recommending a therapy for a particular patient by first finding the cause of the patient's illness. It uses its production rules to reason backward from goals to clinical observations. To solve the top-level diagnostic goal, it looks for rules whose right sides suggest diseases. It then uses the left sides of those rules (the preconditions) to set up subgoals whose success would enable the rules to be invoked. These subgoals are again matched against rules, and their preconditions are used to set up additional subgoals. Whenever a precondition describes a specific piece of clinical evidence, MYCIN uses that evidence if it already has access to it. Otherwise, it asks the user to provide the information. In order that MYCIN's requests for information will appear coherent to the user, the actual goals that MYCIN sets up are often more general than they need be to satisfy the preconditions of an individual rule. For example, if a precondition specifies that the identity of an organism is X, MYCIN will set up the goal "infer identity." This approach also means that if another rule mentions the organism-1's identity, no further work will be required, since the identity will be known.
We can now return to the trace of TEIRESIAS-MYCIN's behavior shown in Figure above. The first question that the user asks is a "WHY" question, which is assumed to mean, "Why do you need to know that?" Particularly for clinical tests that are either expensive or dangerous, it is important for the doctor to be convinced that the information is really needed before ordering the test. (Requests for sensitive or confidential information present similar difficulties.) Because MYCIN is reasoning backward, the question can easily be answered by examining the goal tree. Doing so provides two kinds of information:
1. What higher-level question might the system be able to answer if it had the requested piece of information? (In this case, it could help determine the category of ORGANISM-1.)
2. What other information does the system already have that makes it think that the requested piece of knowledge would help? (In this case, facts [2.1] to [2.4].)
When TEIRESIAS provides the answer to the first of these questions, the user may be satisfied or may want to follow the reasoning process back even further. The user can do that by asking additional "WHY" questions.
When TEIRESIAS provides the answer to the second of these questions and tells the user what it already believes, the user may want to know the basis for those beliefs. The user can ask this with a "HOW" question, which TEIRESIAS will interpret as "How did you know that?" This question can also be answered by looking at the goal tree and chaining backward from the stated fact to the evidence that allowed a rule that determined the fact to fire. Thus we see that by reasoning backward from its top-level goal and by keeping track of the entire tree that it traverses in the process, TEIRESIAS- MYCIN can do a fairly good job of justifying its reasoning to a human user.
The production system model is very general, and without some restrictions, it is hard to support all the kinds of explanations that a human might want. If we focus on a particular type of problem solving, we can ask more probing questions. For example, SALT is a knowledge acquisition program used to build expert systems that design artifacts through a propose-and-revise strategy. SALT is capable of answering questions like WHY-NOT ("why didn't you assign value x to this parameter?") and WHAT-IF ("what would happen if you did?"). A human might ask" these questions in order to locate incorrect or missing knowledge in the system as a precursor to correcting it. We now turn to ways in which a program such as SALT can support the process of building and refining knowledge.
Student Activity 8.2
Before reading next section, answer the following questions.
What is the role of Expert System shells?
What are the chance TERISTIES of a knowledge acquisition system?
Contrast expert system and neural networks in terms of knowledge representation and knowledge acquisition. Give one domain in which the expert system approach would be more promising and one domain in which the neural network approach is more promising.
If your answers are correct, then proceed to the next section.
Knowledge Acquisition
How are expert systems built? Typically, a knowledge engineer interviews a domain expert to elucidate expert knowledge, which is then translated into rules. After the initial system is built, it must be iteratively refined until it approximates expert-level performance. This process is expensive and time-consuming, so it is worthwhile to look for more automatic ways of constructing expert knowledge bases. While no totally automatic knowledge acquisition systems yet exist, there are many programs that interact with domain experts to extract expert knowledge efficiently. These programs provide support for the following activities:
1. Entering knowledge
2. Maintaining knowledge base consistency
3. Ensuring knowledge base completeness
The most useful knowledge acquisition programs are those that are restricted to a particular problem-solving paradigm, e.g., diagnosis or design. It is important to be able to enumerate the roles that knowledge can play in the problem-solving process. For example, if the paradigm is diagnosis, then the program can structure its knowledge base around symptoms, hypotheses, and causes. It can identify symptoms for which the expert has not yet provided causes. Since one symptom may have multiple causes, the program can ask for knowledge about how to decide when one hypothesis is better than another. If we move to another type of problem solving, say designing artifacts, then these acquisition strategies no longer apply, and we must look for other ways of profitably interacting with an expert. We now examine two knowledge acquisition systems in detail.
MOLE is a knowledge acquisition system for heuristic classification problems, such as diagnosing diseases. In particular, it is used in conjunction with the cover-and-differentiate problem-solving method. An expert system produced by MOLE accepts input data, comes up with a set of candidate explanations or classifications that cover (or explain) the data, then uses differentiating knowledge to determine which one is best. The process is iterative, since explanations must themselves be justified, until ultimate causes are ascertained.
MOLE interacts with a domain expert to produce a knowledge base that a system called MOLE-p (for MOLE-performance) uses to solve problems. The acquisition proceeds through several steps:
1. Initial knowledge base construction. MOLE asks the expert to list common symptoms or complaints that might require diagnosis. For each symptom, MOLE prompts for a list of possible explanations. MOLE then iteratively seeks out higher-level explanations until it comes up with a set of ultimate causes. Whenever an event has multiple explanations, MOLE tries to determine the conditions under which one explanation is correct. The expert provides covering knowledge, that is, the knowledge that a hypothesized event might be the cause of a certain symptom. MOLE then tries to infer anticipatory knowledge, which says that if the hypothesized event does occur, then the symptom will definitely appear. This knowledge allows the system to rule out certain hypotheses on the basis that specific symptoms are absent.
2. Refinement of the knowledge base. MOLE now tries to identify the weaknesses of the knowledge base. One approach is to find holes and prompt the expert to fill them. It is difficult in general, to know whether a knowledge base is complete, so instead MOLE lets the expert watch MOLE-p solving sample problems. Whenever MOLE-p makes an incorrect diagnosis, the expert adds new knowledge. There are several ways in which MOLE-p can reach the wrong conclusion. It may incorrectly reject a hypothesis because it does not feel that the hypothesis is needed to explain any symptom. It may advance a hypothesis because it is needed to explain some otherwise inexplicable hypothesis. Or it may lack differentiating knowledge for choosing between alternative hypotheses.
For example, suppose we have a patient with symptoms A and B. Further suppose that symptom A could be caused by events X and ¥, and that symptom B can be caused by Y and Z. MOLE-p might conclude Y, since it explains both A and B. If the expert indicates that this decision was incorrect, then MOLE will ask what evidence should be used to prefer X and/or Z over Y.
MOLE has been used to build systems that diagnose problems with car engines, problems in steel-rolling mills, and inefficiencies in coal-burning power plants. For MOLE to be applicable, however, it must be possible to preenumerate solutions or classifications. It must also be practical to encode the knowledge in terms of covering and differentiating.
But suppose our task is to design an artifact, for example, an elevator system. It is no longer possible to pre-enumerate all solutions. Instead, we must assign values to a large number of parameters, such as the width of the platform, the type of door, the cable weight, and the Cable strength. These parameters must be consistent with each other, and they must result in a design that satisfies external constraints imposed by cost factors, the type of building involved, and expected payloads.
One problem-solving method useful for design tasks is called propose-and-revise. Propose-and-revise systems build up solutions incrementally. First, the system proposes an extension to the current design. Then it checks whether the extension violates any global or local constraints. Constraint violations are then fixed, and the process repeats. It turns out that domain experts are good at listing overall design constraints and at providing local constraints on individual parameters, but not so good at explaining how to arrive at global solutions. The SALT program provides mechanisms for elucidating this knowledge from the expert.
Like MOLE, SALT builds a dependency network as it converses with the expert. Each node stands for a value of a parameter that must be acquired or generated. There are three kinds of links: contributes-to, constrains, and suggests-revision-of. Associated with the first type of link are procedures that allow SALT to generate a value for one parameter based on the value of another. The second type of link, constrains, rules out certain parameter values. The third link, suggests-revision-of, points to ways in which a constraint violation can be fixed. SALT uses the following heuristics to guide the acquisition process:
1. Every noninput node in the network needs at least one contributes-to link coming into it. If links are missing, the expert is prompted to fill them in.
2. No contributes-to loops are allowed in the network. Without a value for at least one parameter in the loop, it is impossible to compute values for any parameter in that loop. If a loop exists, SALT tries to transform one of the contributes-to links into a constraint link.
3. Constraining links should have suggests-revision-of links associated with them. These include constrains links that are created when dependency loops are broken.
Control knowledge is also important. It is critical that the system propose extensions and revisions that lead toward a design solution. SALT allows the expert to rate revisions in terms of how much trouble they tend to produce.
SALT compiles its dependency network into a set of production rules. As with MOLE, an expert can watch the production system, solve problems and can override the system's decision. At that point, the knowledge base can be changed or the override can be logged for future inspection.
The process of interviewing a human expert to extract expertise presents a number of difficulties, regardless of whether the interview is conducted by a human or by a machine. Experts are surprisingly inarticulate when it comes to how they solve problems. They do not seem to have access to the low-level details of what they do and are especially inadequate suppliers of any type of statistical information. There is, therefore, a great deal of interest in building systems that automatically induce their own rules by looking at sample problems and solutions. With inductive techniques, an expert needs only to provide the conceptual framework for a problem and a set of useful examples.
For example, consider a bank's problem in deciding whether to approve a loan. One approach to automating this task is to interview loan officers in an attempt to extract their domain knowledge. Another approach is to inspect the record of loans the bank has made in the past and then try to generate automatically rules that will maximize the number of good loans and minimize the number of bad ones in the future.
META-DENDRAL was the first program to use learning techniques to construct rules for an expert system automatically. It built rules to be used by DENDRAL, whose job was to determine the structure of complex chemical compounds. META-DENDRAL was able to induce its rules based on a set of mass spectrometry data; it was then able to identify molecular structures with very high accuracy. META-DENDRAL used the version space learning algorithm. Another popular method for automatically constructing expert systems is the induction of decision trees. Decision tree expert systems have been built for assessing consumer credit applications, analyzing hypothyroid conditions, and diagnosing soybean diseases, among many other applications.
Statistical techniques, such as multivariate analysis, provide an alternative approach to building expert-level systems. Unfortunately, statistical methods do not produce concise rules that humans can understand. Therefore it is difficult for them to explain their decisions.
For highly structured problems that require deep causal chains of reasoning, learning techniques are presently inadequate. There is, however, a great deal of research activity in this area.
Summary
l Expert systems use symbolic representations for knowledge (rules, networks, or frames) and perform their inference through symbolic computations that closely resemble manipulations of natural language. An expert system is usually built with the aid of one or more experts, who must be willing to spend a great deal of effort transferring their expertise to the system.
l Expert systems are complex AI programs. However, the most widely used by way of representing domain knowledge in expert systems is, as a set of production rules, which are often coupled with a frame system that defines the objects that occur in the rules
l The most useful knowledge acquisition programs are those that are restricted to a particular problem-solving paradigm, e.g., diagnosis or design
l Transfer of knowledge takes place gradually through many interactions between the expert and the system, The expert will never get the knowledge right or complete the first time.
l The amount of knowledge that is required depends on the task. It may range from forty rules to thousands.
l The choice of control structure for a particular system depends on specific characteristics of the system.
l It is possible to extract the nondomain-specific parts from existing expert systems and use them as tools for building new systems in new domains.
l MYCIN is an expert system, which diagnoses infectious blood diseases and determines a recommended list of therapies for the patient.
l RI (sometimes also called XCON) is a program that configures DEC VAX systems

Artificial intelligence Section B

Knowledge Representation
Learning Objectives
After reading this unit you should appreciate the following:
Representation and Mapping
Approaches to Knowledge Representation
The Frame Problem
We had discussed the role that knowledge plays in AI systems in earlier units. In the later unit we have paid little attention to knowledge and its importance as we had focused on basic frameworks for building search-based problem-solving programs. These methods are sufficiently general that we have been able to discuss them without reference to how the knowledge they need is to be represented. Although these methods are useful and form the skeleton of many of the methods we are about to discuss, their problem-solving power is limited precisely because of their generality. As we look in more detail at ways of representing knowledge, it becomes clear that particular knowledge representation models allow for more specific, more powerful problem-solving mechanisms that operate on them. In this unit, we return to the topic of knowledge used within the programs and examine specific techniques that can be used for representing and manipulating knowledge.
Top
Representations and Mappings
Artificial Intelligence needs both a large amount of knowledge and some mechanisms for manipulating that knowledge to create solutions to new problems. A variety of ways of representing knowledge have been exploited in AI programs. But before we can talk about them individually, we must consider the following point that pertains to all discussions of representation, namely that we are dealing with two different kinds of entities:
Facts: truths in some relevant world. These are the things we want to represent.
Representations of facts in some chosen formalism. These are the things we will actually be able to manipulate.



Figure 3.1: Mappings between Facts and Representations
One way to think of structuring these entities is as two levels:
The knowledge level, at which facts (including each agent's behaviors and current goals) are described.
The symbol level, at which representations of objects at the knowledge level are defined in terms of symbols that can be manipulated by programs.
Rather than thinking of one level on top, of another, we will focus on facts, on representations, and on the two-way mappings that must exist between them as shown in Figure 3.1. We will call these links representation mappings. The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
One representation of facts is so common that it deserves special mention: natural language sentences. Regardless of the representation for facts that we use in a program, we may also need to be concerned with an English representation of those facts in order to facilitate getting information into and out of the system. In this case, we must also have mapping functions from English sentences to the representation we are actually going to use and from it back to sentences. Figure 3.1 shows how these three kinds of objects relate to each other.
Let's look at a simple example using mathematical logic as the representational formalism. Consider the English sentence:
Tom is a cat.
The fact represented by this English sentence can also be represented in logic as:
Cat (Tom)
Suppose that we also have a logical representation of the fact that all cats have tails:
"x : cats (x)à hastail(x)
Then, using the deductive mechanisms of logic, we may generate the new representation object:
hastail(Tom)
Using an appropriate backward mapping function, we could then generate the English sentence:
Tom has a tail.
Or we could make use of this representation of a new fact to cause us to take some appropriate action or to derive representations of additional facts.
It is important to keep in mind that usually the available mapping functions are not one-to-one. In fact, they are often not even functions but rather many-to-many relations. (In other words, each object in the domain may map to several elements in the range, and several elements in the domain may map to the same element of the range.) This is particularly true of the mappings involving English representations of facts. For example, the two sentences "All cats have tails" and "Every cat has a tail" could both represent the same fact, namely that every cat has at least one tail. On the other hand, the former could represent either the fact that every cat has at least one tail or the fact that each cat has several tails. The latter may represent either the fact that every cat has at least one tail or the fact that there is a tail that every cat has. As we will see shortly, when we try to convert English sentences into some other representation, such as logical propositions, we must first decide what facts the sentences represent and then convert those facts into the new representation.
The starred links of Figure 3.1 are key components of the design of any knowledge-based program. Sometimes, a good representation makes the operation of a reasoning program not only correct but also trivial. A well-known example of this occurs in the context of the mutilated checkerboard problem, which can be stated as follows:
The Mutilated Checkerboard Problem. Consider a normal checkerboard from which two squares; in opposite corners have been removed. The task is to cover all the remaining squares exactly with dominoes, each of which covers two squares. No overlapping, either of dominoes on top of each other or of dominoes over the boundary of the mutilated board is allowed. Can this task, be done?


To solve this problem we can enumerate all possible tilings to see if one works. But suppose one wants to be cleverer. Figure 3.2 shows three ways in which the mutilated checkerboard could be represented (to a person). The first representation does not directly suggest the answer to the problem. The second may; the third does when combined with the single additional fact that each domino must cover exactly one while square and one black square. Even for human problem solvers a representation shift may make an enormous difference in problem-solving effectiveness.
Figure 3.2: Three Representations of a Mutilated Checkerboard
The mapping between the facts and representations which was shown in Figure 3.1 has been expanded in Figure 3.3. The dotted line across the top represents the abstract reasoning process that a program is intended to model. The solid line across the bottom represents the concrete reasoning process that a particular program performs. This program successfully models the abstract process to the extent that, when the backward representation mapping is applied to the program's output, the appropriate final facts are actually generated. If either the program’s operation or one of the representation mappings is not faithful to the problem that is being modeled, then the final facts will probably not be the desired ones. The key role that is played by the nature of the representation mapping is apparent from this figure. If no good mapping can be defined for a problem, then no matter how good the program to solve the problem is, it will not be able to produce answers that correspond to real answers to the problem.


Figure 3.3 looks very much like the figure that might appear in a general programming book as a description of the relationship between an abstract data type (such as a set) and a concrete implementation of that type (e.g. as a linked list of elements). But there are some differences. For example, in data type design it is expected that the mapping that we are calling the backward representation mapping is a function (i.e., every representation corresponds to only one fact) and that it is onto (i.e., there is at least one representation for every fact). Unfortunately, in many AI domains, it may not be possible to come up with such a representation mapping, and we may have to live with one that gives less ideal results. But the main idea of what we are doing is the same as what programmers always do, namely to find concrete implementations of abstract concepts.
Figure 3.3: Representation of Fact


Before reading next section, answer the following questions.
1. Discuss different ways of knowledge representation.
Represent the following facts in logic:
John plays football.
Jim loves Jani.
Caesar was a ruler.
If your answers are correct, then proceed to the next section.
Approaches to Knowledge Representation
A good system for the representation of knowledge in a particular domain should possess the following four properties:
Representational Adequacy – the ability to represent all of the kinds of knowledge that are needed in that domain.
Inferential Adequacy-the ability to manipulate the representational structures in such a way as to derive new structures corresponding to new knowledge inferred from old.
Inferential Efficiency-the ability to incorporate into the knowledge structure additional information that can be used to focus the attention of the Inference mechanisms in the most promising directions.
Acquisitional Efficiency-the ability to acquire new information easily. The simplest case involves direct insertion, by a person, of new knowledge into the database. Ideally, the program itself would be able to control knowledge acquisition.
Multiple techniques for knowledge representation exist because unfortunately, no single system that optimizes all of the capabilities for all kinds of knowledge has yet been found. Many programs rely on more than one technique. As we proceed further the most important of these techniques are described in detail. But in this section, we provide a simple, example-based introduction to the important ideas.
Simple Relational Knowledge


The simplest way to represent declarative facts is as a set of relations of the same sort, used in database systems. Figure 3.4 shows an example of such a relational system.

Figure 3.4: Simple Relational Knowledge
The reason that this representation is simple is that standing alone it provides very weak inferential capabilities. But knowledge represented in this form may serve as the input to more powerful inference engines. For example, given just the facts of Figure 3.4, it is not possible even to answer the simple question, "Who is the heaviest player?" But if a procedure for finding the heaviest player is provided, then these facts will enable the procedure to compute an answer. If, instead, we are provided with a set of rules for deciding which hitter to put up against a given pitcher (based on right- and left- handedness, say), then this same relation can provide at least some of the information required by those rules.
Providing support for relational knowledge is what database systems are designed to do. Thus we do not need to discuss this kind of knowledge representation structure further here. The practical issues that arise in linking a database system that provides this kind of support to a knowledge representation system that provides some of the other capabilities that we are about to discuss have already been solved in several commercial products.
Inheritable Knowledge
It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired. One of the most useful forms of inference is property inheritance, in which elements of specific classes inherit attributes and values from more general classes in which they are included.


Objects must be organized into classes and classes must be arranged in a generalization hierarchy in order to support property inheritance. Figure 3.5 shows some additional baseball knowledge inserted into a structure that is so arranged. Lines represent attributes. Boxed nodes represent objects and values of attributes of objects. These values can also be viewed as objects with attributes and values, and so on. The arrows on the lines point from an object to its value along the corresponding attribute line. The structure shown in the figure is a slot-and-filler structure. It may also be called a semantic network or a collection of frames. In the latter case, each individual frame represents the collection of attributes and values associated with a particular node. Figure 3.6 shows the node for baseball player displayed as a frame.
Figure 3.5: Inheritable Knowledge
Usually the use of the term frame system implies somewhat more structure on the attributes and the inference mechanisms that are available to apply, to them than does the term semantic network.


All of the objects and most of the attributes shown in this example have been chosen to correspond to the baseball domain, and they have no general significance. The two exceptions to
Figure 3.6: Viewing a Node as a Frame
this are the attribute isa, which is being used to show class inclusion, and the attribute instance, which is being used to show, class membership. These two specific (and generally useful) attributes provide the basis for property inheritance as an inference technique. Using this technique, the knowledge base can support retrieval both of facts that have been explicitly stored and of facts that can be derived from those that are explicitly stored. An idealized form of the property inheritance algorithm can be stated as follows:
Algorithm: Property Inheritance
To retrieve a value V for attribute A of an instance object O:
1. Find O in the knowledge base.
2. If there is a value there for the attribute A, report that value.
3. Otherwise, see if there is a value for the attribute instance. If not, then fail.
4. Otherwise, move to the node corresponding to that value and look for a value for the attribute A. If one is found, report it.
5. Otherwise, do until there is no value for the isa attribute or until an answer is found:
a. Get the value of the isa attribute and move to that node.
b. See if there is a value for the attribute A. If there is, report it.
This procedure is simplistic. It does not say what we should do if there is more than one value of the instance or isa attribute. But it does describe the basic mechanism of inheritance. We can apply this procedure to our example knowledge base to derive answers to the following queries:
team(Pee-Wee-Reese) = Brooklyn-Dodgers. This attribute had a value stored explicitly in the knowledge base.
batting-average(Three-Finger-Brown) = .106. Since there is no value for batting average stored explicitly for Three Finger Brown, we follow the instance attribute to Pitcher and extract the value stored there. Now we observe one of the critical characteristics of property inheritance, namely that it may produce default values that are not guaranteed to be correct but that represent "best guesses" in the face of a lack of more precise information. In fact, in 1906, Brown's batting average was .204.
height(Pee-Wee-Reese) = 6-1. This represents another default inference. Notice here that because we get to it first, the more specific fact about the height of baseball players overrides a more general fact about the height of adult males.
bats(Three-Finger-Brown) = Right. To get a value for the attribute bats required going up the isa hierarchy to the class Baseball-Player. But what we found there was not a value but a rule for computing a value. This rule required another value (that for handed) as input. So the entire process must be begun again recursively to find a value for handed. This time, it is necessary to go all the way up to Person to discover that the default value for handedness for people is Right. Now the rule for bats can be applied, producing the result Right. In this case, that turns out to be wrong, since Brown is a switch hitter (i.e., he can hit both left- and right-handed).
Inferential Knowledge


Figure 3.7 shows two examples of the use of first-order predicate logic to represent additional knowledge about baseball.
Figure 3.7: Inferential Knowledge
Of course, this knowledge is useless unless there is also an inference procedure that can exploit it The required inference procedure now is one that implements the standard logical rules of inference. There are many such procedures, some of which reason forward from given facts to conclusions, others of which reason backward from desired conclusions to given facts. One of the most commonly used of these procedures is resolution, which exploits a proof by contradiction strategy.
In general, in fact, all of the techniques we are describing here should not be regarded as complete and incompatible, ways of representing knowledge. Instead, they should be viewed as building blocks of a complete representational system.
Procedural Knowledge
Another, equally useful, kind of knowledge is operational, or procedural knowledge, that specifies what to do when. Procedural knowledge can be represented in programs in many ways. The most common way is simply as code (in some programming language such as LISP) for doing something. The machine uses the knowledge when it executes the code to perform a task. Unfortunately, this way of representing procedural knowledge gets low scores with respect to the properties of inferential adequacy (because it is very difficult to write a program that can reason about another program's behaviour) and acquisitional efficiency (because the process of updating and debugging large pieces of code becomes unwieldy).







Figure 3.8: Using LISP Code to Define a Value
As an extreme example, compare the representation of the way to compute the value, of bats shown in Figure 3.6 to one in LISP shown in Figure 3.8. Although the LISP one will work given a particular way of storing attributes and values in a list, it does not lend itself to being reasoned about in the same straightforward way as the representation of Figure 3.6 does. The LISP representation is slightly more powerful since it makes explicit use of the name of the node whose value for handed is to be found. But if this matters, the simpler representation can be augmented to do this as well.
Because of this difficulty in reasoning with LISP, attempts have been made to find other ways of representing procedural knowledge so that it can relatively easily be manipulated both by other programs and by people.
The most commonly used technique for representing procedural knowledge in AI programs is the use of production rules. Figure 3.9 shows an example of a production rule that represents a piece of operational knowledge typically possessed by a baseball player.
Production rules, particularly ones that are augmented with information on how they are to be used, are more procedural than are the other representation methods discussed in this chapter. But making a clean distinction between declarative and procedural knowledge is difficult. Although at an intuitive level such a distinction makes some sense, at a formal level it disappears. In fact, as you can see the structure of the declarative knowledge of Figure 3. 7 is not substantially different from that of the operational knowledge of Figure 3.9. The important difference is in how the knowledge is used by the procedures that manipulate it.









Figure 3.9: Procedural Knowledge as Rules
Top
Issues in Knowledge Representation
Before embarking on a discussion of specific mechanisms that have been used to represent various kinds of real-world knowledge, we need briefly to discuss several issues that cut across all of them:
Are any attributes of objects so basic that they occur in almost every problem domain? If there are, we need to make sure that they are handled appropriately in each of the mechanisms we propose. If such attributes exist, what are they?
Are there any important relationships that exist among attributes of objects?
At what level should knowledge be represented? Is there a good set of primitives into which all knowledge can be broken down? Is it helpful to use such primitives?
How should sets of objects be represented?
Given a large amount of knowledge stored in a database, how can relevant parts be accessed when they are needed?
We will talk about each of these questions briefly in the next five sections.
Important Attributes
Instance and isa are two attributes that are of very general significance. These attributes are important because they support property inheritance. They are called a variety of things in AI systems, but the names do not matter. What does matter is that they represent class membership and class inclusion and that class inclusion is transitive. In slot-and-filler systems, these attributes are usually represented explicitly in a way much like that shown in Figures 3.5 and 3.6. In logic-based systems, these relationships may be represented this way or they may be represented implicitly by a set of predicates describing particular classes.
Relationships among Attributes
The attributes that we use to describe objects are themselves entities that we represent. What properties do they have independent of the specific knowledge they encode? There are four such properties that deserve mention here:
Inverses
Existence in an isa hierarchy
Techniques for reasoning about values
Single-valued attributes
Inverses
Entities in the world are related to each other in many different ways. But as soon as we decide to describe those relationships as attributes, we commit to a perspective in which we focus on one object and look for binary relationships between it and others. Attributes are those relationships. So, for example, in Figure 3.5, we used the attributes instance, isa, and team. Each of these was shown in the figure with a directed arrow, originating at the object that was being described and terminating at the object representing the value of the specified attribute. But we could equally well have focused on the object representing the value. If we do that, then there is still a relationship between the two entities, although it is a different one since the original relationship was not symmetric (although some relationships, such as sibling are). In many cases, it is important to represent this other view of relationships. There are two good ways to do this.
The first is to represent both relationships in a single representation that ignores focus. Logical representations are usually interpreted as doing this. For example, the assertion:
team(Pee-Wee-Reese, Brooklyn-Dodgers)
can equally easily be interpreted as a statement about Pee Wee Reese or about the Brooklyn Dodgers. How it is actually used depends on the other assertions that a system contains.
The second approach is to use attributes that focus on a single entity but to use them in pairs, one the inverse of the other. In this approach, we would represent the team information with two attributes:
one associated with Pee Wee Reese
team = Brooklyn-Dodgers
one associated with Brooklyn Dodgers
team-members = Pee-Wee-Reese,…….
This is the approach that is taken in semantic net and frame based systems. When it is used, it is usually accompanied by a knowledge acquisition tool that guarantees the consistency of inverse slots by forcing them to be declared, and then checking each time a value is added to one attribute that the corresponding value is added to the inverse.
An Isa Hierarchy of Attributes
Further attributes and specialization of attributes are there just as there are classes of objects and specialized subsets of those classes. Consider, for example, the attribute height. It is actually a specialization of the more general attribute physical-size which is, in, turn, a specialization of physical-attribute. These generalization-specialization relationships are important for attributes for the same reason that they are important for other concepts i.e. they support inheritance. In the case of attributes, they support inheriting information about such things as constraints on the values that the attribute can have and mechanisms for computing those values.
Techniques for Reasoning about Values
Sometimes values of attributes are specified explicitly when a knowledge base is created. We saw several examples of that in the baseball example of Figure 3.5. But often the reasoning system must reason about values it has not been given explicitly. Several kinds of information can play a role in this reasoning, including:
Information about the type of the value. For example, the value of height must be a number measured in a unit of length.
Constraints on the value often stated in terms of related entities. For example, the age of a person cannot be greater than the age of either of that person's parents.
Rules for computing the value when it is needed. We showed an example of such a rule in Figure 3.5 for the bats attribute. These rules are called backward rules. Such rules have also been called if-needed rules.
Rules that describe actions that should be taken, if a value ever becomes known. These rules are called forward rules, or sometimes if-added rules.
Single-Valued Attributes
A unique value is taken up by specific but very useful kind of attribute. For example, a baseball player can, at anyone time, have only a single height and be a member of only one team. If there is already a value present for one of these attributes and a different value is asserted, then one of two things has happened. Either a change has occurred in the world or there is now a contradiction in the knowledge base that needs to be resolved. Knowledge-representation systems have taken several different approaches to provide support for single-valued attributes, including:
Introduce an explicit notation for temporal interval. If two different values are ever asserted for the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is asserted, replace the old value.
Provide no explicit support. Logic-based systems are in this category. But in these systems, knowledge-base builders can add axioms that state that if an attribute has one value then it is known not to have all other values.
Choosing the Granularity of Representation
It is necessary to answer the question “At what level of detail should the world be represented?”, irrespective of the particular formalism. Should there be a small number of low-level ones or should there be a larger number covering a range of granularities? A brief example illustrates the problem. Suppose we are interested in the following fact:
John spotted Sue.
We could represent this as
spotted(agent(John) ,
object(Sue))
Such a representation would make it easy to answer questions such as:
Who spotted Sue?
But now suppose we want to know:
John see Sue?
The obvious answer is "yes," but given only the one fact we have, we cannot discover that answer. We could, of course, add other facts, such as
spotted(x,y) à saw(x,y) ,
We could then infer the answer to the question.
An alternative solution to this problem is to represent the fact that spotting is really a special type of seeing explicitly in the representation of the fact. We might write: something such as
saw(agent(John),
object(Sue) ,
timespan(briefty))
Here we have broken the idea of spotting apart into more primitive concepts of seeing and timespan. Using this representation, the fact that John saw Sue is immediately accessible. But the fact that he spotted her is more difficult to get to. The major advantage of converting all statements into a representation in terms of a small set of primitives is that the rules that are used to derive inferences from that knowledge need be written only in terms of the primitives rather than in terms of the many ways in which the knowledge may originally have appeared. Thus what is really being argued for is simply some sort of canonical form.
One of the several arguments which go against the use of low-level primitives is that simple high-level facts may require a lot of storage when broken down into primitives. Much of that storage is really wasted since the low-level rendition of a particular high-level concept will appear many times, once for each time the high-level concept is referenced.
Most of these limitations can be overcome by using another strategy for finding the structure and the meaning of a sentence in one step, called Conceptual Parsing. Conceptual parsing, like semantic grammars, is a strategy for finding both the structure and the meaning of a sentence in one step. Conceptual parsing is driven by a dictionary that describes the meanings of words as conceptual dependency (CD) structures. The first step in mapping a sentence into its CD representation involves a syntactic processor that extracts the main noun and verb. It also determines the syntactic category and aspectual class of the verb (i.e., stative, transitive, or intransitive). The conceptual processor then takes over. It makes use of verb-ACT dictionary, which contains an entry for each environment in which a verb can appear. For example, suppose that actions are being represented as combinations of a small set of primitive actions. Then the fact that John punched Mary might be represented as shown in Figure 3.10(a). The representation says that there was physical contact between John's fist and Mary. The contact was caused by John propelling his fist toward Mary, and in order to do that John first went to where Mary was? But suppose we also know that Mary punched John. Then we must also store the structure shown in Figure 3.10(b). If, however, punching were represented simply as punching, then most of the detail of both structures could be omitted from the structures themselves. It could instead be stored just once in a common representation of the concept of punching.
A second but related problem is that substantial work must be done to reduce the knowledge into primitive form. if knowledge is initially presented to the system in a relatively high-level form, such as English. Both in understanding language and in interpreting the world that we see, many things appear that later turn out to be irrelevant. For the sake of efficiency, it may be desirable to store these things at a very high level and then to analyze in detail only those inputs that appear to be important.
A third problem with the use of low-level primitives is that in many domains, it is not at all clear what the primitives should be. And even in domains in which there may be an obvious set of primitives, there may not be enough information present in each use of the high-level constructs to enable them to be converted into their primitive components. When this is true, there is no way to avoid representing facts at a variety of granularities.
There exists at least one obvious set of primitives: mother, father, son, daughter, and possibly brother and sister. But now suppose we are told that Mary is Sue's cousin. An attempt to describe the cousin relationship in terms of the primitives could produce any of the following interpretations:

Mary = daughter(brother(mother(Sue)))

Mary = daughter(sister(mother(Sue)))

Mary = daughter(brother(father(Sue)))

Mary = daughter(sister(father(Sue)))




Figure 3.10: Redundant representations
As illustrated, the problem of choosing the correct granularity of representation for a particular body of knowledge is not easy. Clearly, the lower the level we choose, the less inference required to reason with it in some cases, but the more inference required to create the representation from English and the more room it takes to store, since many inferences will be represented many times. The answer for any particular task domain must come to a large extent from the domain itself-to what use is the knowledge to be put?
One way of looking at the question of whether there exists a good set of low-level primitives is that it is a question of the existence of a unique representation. Does there exist a single, canonical way in which large bodies of knowledge can be represented independently of how they were originally stated? Another, closely related, uniqueness question asks whether individual objects can be represented uniquely and independently of how they are described.
The phrase Evening Star names a certain large physical object of spherical form, which is hurtling through space some scores of millions of miles from here. The phrase Morning Star names the same thing, as was probably first established by some observant Babylonian. But the two phrases cannot be regarded as having the same meaning, otherwise that Babytonian could have dispensed with his observations and contented himself with reflecting on the meaning of his words. The meanings, then, being different from one another, must be other than the named object, which is one and the same in both cases. For a program to be able to reason as did the Babylonian, it must be able to handle several distinct representations that turn out to stand for the same object.
Representing Sets of Objects
There are several reasons to satisfy why we must represent set of objects. One is that there are some properties that are true of sets that are not true of the individual members of a set. As examples, consider the assertions that are being made in the sentences "There are more sheep than people in Australia" and "English speakers can be found all over the world." The only way to represent the facts described in these sentences is to attach assertions to the sets representing people, sheep, and English speakers, since, for example, no single English speaker can be found all over the world.
Secondly if a property is true of all (or even most) elements of a set, then it is more efficient to associate it once with the set rather than to associate it explicitly with every element of the set. We have already looked at ways of doing that, both in logical representations through the use of the universal quantifier and in slot-and-filler structures, where we used nodes to represent sets and inheritance to propagate set-level assertions down to individuals. As we consider ways to represent sets, we will want to consider both of these uses of set-level representations. We will also need to remember that the two uses must be kept distinct. Thus if we assert something like large(Elephant), it must be clear whether we are asserting some property of the set itself or some property that holds for individual elements of the set.
The simplest way in which a set may be represented is just by a name. This simple representation does make it possible to associate predicates with sets. But it does not, by itself, provide any information about the set it represents. It does not, for example, tell how to determine whether a particular object is a member of the set or not.
There are two ways to state a definition of a set and its elements. The first is to list the members. Such a specification is called an extensional definition. The second is to provide a rule that, when a particular object is evaluated, return true or false depending on whether the object is in the set or not. Such a rule is called an intentional definition. For example, an extensional description of the set of our sun's planets on which people live is {Earth}. An intentional description is
{x: sun-planet(x) Ù human-inhabited(x)}
For simple sets, it may not matter, except possibly with respect to efficiency concerns, which representation is used. But the two kinds of representations can function differently in some cases.
One way in which extensional and intentional representations differ is that they do, not necessarily correspond one-to-one with each other. For example, the extensionally defined set {Earth} has many intentional definitions in addition to the one we just gave. Others include:
{x : sun-planet(x) Ù nth-farthest-from-sun(x, 3)}
{x: sun-planet(x) Ù nth-biggest(x, 5}}
Thus, while it is trivial to determine whether two sets are identical if extensional descriptions are used, it may be very difficult to do so using intentional descriptions. Intentional representations have two important properties that extensional ones lack, however. The first is that they can be used to describe infinite sets and sets not all of whose elements are explicitly known. Thus we can describe intentionally such sets as prime numbers (of which there are infinitely many) or kings of England (even though we do not know who all of them are or even how many of them there have been). The second thing we can do with intentional descriptions is to allow them to depend on parameters that can change, such as time or spatial location. If we do that, then the actual set that is represented by the description will change as a function of the value of those parameters. To see the effect of this, consider the sentence, "The president of the United States used to be a Democrat," uttered when the current president is a Republican. This sentence can mean two things. The first is that the specific person who is now president was once a Democrat. This meaning can be captured straightforwardly with an extensional representation of "the president of the United States." We just specify the individual. But there is a second meaning, namely that there was once someone who was the president and who was a Democrat. To represent the meaning of "the president of the United States" given this interpretation requires an intentional description that depends on time. Thus we might write president(t), where president is some function that maps instances of time onto instances of people, namely U.S. presidents.
Finding the Right Structures as Needed
Suppose we have a script (a description of a class of events in terms of contexts, participants, and subevents) that describes the typical sequence of events in a restaurant. This script would enable us to take a text such as
John went to Steak and Ale last night. He ordered a large rare steak, paid his bill and left and answer "yes" to the question:
Did John eat dinner last night?
One thing we must notice that nowhere in the story it was mentioned explicitly that John's eating anything . But the fact that when one goes to a restaurant, one eats, will be contained in the restaurant script. If we know in advance to use the restaurant script then we can answer the question easily. But in order to be able to reason about a variety of things, a system must have many scripts for everything from going to work to sailing around the world. How will it select the appropriate one each time? For example, nowhere in our story was the word "restaurant” mentioned.
In fact, in order to have access to the right structure for describing a particular situation, it is necessary to solve all of the following problems.
How to perform an initial selection of the most appropriate structure.
How to fill in appropriate details from the current situation.
How to find a better structure if the one chosen initially turns out not to be appropriate? What to do if none of the available structures is appropriate?
When to create and remember a new structure.
There is no good general purpose method for solving all these problems. Some knowledge-representation techniques solve some of them. In this section we survey some solutions to two of these problems: how to select an initial structure to consider and how to find a better structure if that one turns out not to be a good match.
Selecting an Initial Structure
Selecting candidate knowledge structures to match a particular problem-solving situation is a hard problem; there are several ways in which it can be done. Three important approaches are the following:
1. Index the structures directly by the significant English words that can be used to describe them. For example, let each verb have associated with it a structure that describes its meaning. This is the approach taken in conceptual dependency theory. Even for selecting simple structures, such as those representing the meanings of individual words, though, this approach may not be adequate, since many words may have several distinct meanings. For example the word "fly" has a different meaning in each of the following sentences:
- John flew to New York. (He rode in a plane from one place to another.)
- John flew a kite. (He held a kite that was up in the air.)
- John flew down the street. (He moved very rapidly.)
- John flew into a rage. (An idiom)
Another problem with this approach is that it is only useful when there is an English description of the problem to be solved.
2. Consider each major concept as a pointer to all of the structures (such as scripts) in which it might be involved. This may produce several sets of prospective structures. For example, the concept Steak might point to two scripts, one for restaurant and one for supermarket. The concept Bill might point to a restaurant and a shopping script. Take the intersection of those sets to get the structure(s), preferably precisely one, that involves all the content words. Given the pointers just described and the story about John's trip to Steak and Ale, the restaurant script would be evoked. One important problem with this method is that if the problem description contains any even slightly extraneous concepts, then the intersection of their associated structures will be empty. This might occur if we had said, for example, "John rode his bicycle to Steak and Ale last night." Another problem is that it may require a great deal of computation to compute all of the possibility sets and then to intersect them. However, if computing such sets and intersecting them could be done in parallel, then the time required to produce an answer would be reasonable even if the total number of computations is large.
3. Locate one major clue in the problem description and use it to select an initial structure. As other clues appear, use them to refine the initial selection or to make a completely new one if necessary. The major problem with this method is that in some situations there is not an easily identifiable major clue. A second problem is that it is necessary to anticipate which clues are going to be important and which are not. But the relative importance of clues can change dramatically from one situation to another. For example, in many contexts, the colour of the objects involved is not important. But if we are told “The light turned red," then the colour of the light is the most important feature to consider.
Unfortunately none of these proposals seems to be the complete answer to the problem. If at all we get one of the more complex the knowledge structures, the harder it is, to tell when a particular one is appropriate.
Revising the Choice When Necessary
Depending on the representation we are using, the details of the matching process will vary. It may require variables to be bound to objects. It may require attributes to have their values compared. In any case, if values that satisfy the required restrictions as imposed by the knowledge structure can be found, they are put into the appropriate places in the structure. If no appropriate values can be found, then a new structure must be selected. The way in which the attempt to instantiate this first structure failed may provide useful cues as to which one to try next. If, on the other hand, appropriate values can be found, then the current structure can be taken to be appropriate for describing the current situation. But, of course, that situation may change. Then information about what happened (for example, we walked around the room we were looking at) may be useful in selecting a new structure to describe the revised situation.
When the process runs into a snag, though, it is often not necessary to abandon the effort and start over. Rather, there are a variety of things that can be done:
Select the fragments of the current structure that do correspond to the situation and match them against candidate alternatives. Choose the best match. If the current structure was at all close to being appropriate, much of the work that has been done to build substructures to fit into it will be preserved.
Make an excuse for the current structure's failure and continue to use it. For example, a proposed chair with only three legs might simply be broken. Or there might be another object in front of it which occludes one leg part of the structure should contain information about the features for which it is acceptable to make excuses. Also, there are general heuristics, such as the fact that a structure is more likely to be appropriate if a desired feature is missing (perhaps because it is hidden from view) than if an inappropriate feature is present. For example, a person with one leg is more plausible than a person with a tail.
Refer to specific stored links between structures to suggest new directions in which to explore. An example of this sort of linking among a set of frames is shown in the similarity network shown in Figure 3.11.
If the knowledge structures are stored in an isa hierarchy, traverse upward in it until a structure is found that is sufficiently general that it does not conflict with the evidence. Either use this structure if it is specific enough to provide the required knowledge or consider creating a new structure just below the matching one.
Student Activity 3.2
Before reading next section, answer the following questions.
1. Write short notes on inferential knowledge and procedure knowledge.
2. What are the main issues in knowledge representation?
3. What is the use of “instance” and “isa” attributes?
If your answers are correct, then proceed to the next section.
Top
The Frame Problem


We have seen several methods for representing knowledge that would allow us to form complex state descriptions for a search program. Now next concern is how to represent efficiently sequences of problem states that arise from a search process. For complex ill-structured problems, this can be a serious matter.
Figure 3.11: A Similarity Net
Consider the world of a household robot. There are many objects and relationships in the world, and a state description must somehow include facts like on (Plant 12, Table34), under(Table34, Window'13), and in(Table34, RoomI5). One strategy is to store each state description as a list of such facts. Most of the facts will not change from one state to another, yet each fact will be represented once at every node, and we will quickly run out of memory. Furthermore, we will spend the majority of our time creating these nodes and copying these facts-most of which do not change often-from one node to another. For example, in the robot world, we could spend a lot of time recording above(Ceiling, Floor) at every node. In addition to the real problem of figuring out which facts should be different at each node.
This whole problem of representing the facts that change as well as those that do not is known as the frame problem. In some domains, the only hard part is representing all the facts. In others, though, figuring out which ones change is nontrivial. For example, in the robot world, there might be a table with a plant, on it under the window. Suppose we move the table to the center of the room. We must also infer that the plant is now in the center of the room too but that the window is not.
To support this kind of reasoning, some systems make use of an explicit set of axioms called frame axioms, which describe all the things that do not change when a particular operator is applied in state n to produce state n+1. Thus in the robot domain, we might write axioms such as,
Color(x,y,s1) Ù move(x,S1,S2)à color(x,y,s2)
which can be read as, "If x has colour y in state s1 and the operation of moving x is applied in state s1 to produce state s2, then the colour of x in S2 is still y. Unfortunately, in any complex domain, a huge number of these axioms becomes necessary. An alternative approach is to make the assumption that the only things that change are the things that must. By "must" here we mean that the change is either required explicitly by the axioms that describe the operator or that it follows logically from some change that is asserted explicitly. This idea of circumscribing the set of unusual things is a very powerful one, it can be used as a partial solution to the frame problem and as a way of reasoning with incomplete knowledge.
Briefly returning back to the problem of representing a changing problem state. Simply starting with a description of the initial state can do that and then making changes to that description as indicated by the rules we apply. This solves the problem of the wasted space and time involved in copying the information for each node. And it works fine until the first time the search has to backtrack. Then, unless all the changes that were made can simply be ignored, we are faced with the problem of backing up to some earlier node. But how do we know what changes in the problem state description need to be undone? There are two ways this problem can be solved.
l Do not modify the initial state description at all. At each node, store an indication of the specific changes that should be made at this node. Whenever it is necessary to refer to the description of the current problem state, look at the initial state description and also look back through all the nodes on the path from the start state to the current state. This approach makes backtracking very easy, but it makes referring to the state description fairly complex.
l Modify the initial state description as appropriate, but also record at each node an indication of what to do to undo the move should it ever be necessary to backtrack through the node. Then, whenever it is necessary to backtrack, check each node along the way and perform the indicated operations on the state description.

Answer the following questions.
1. What do you understand by a frame? How is it useful in knowledge representation?
2. What are Frame Axioms, where is it used?
Summary
l Facts: truths in some relevant world. These are the things we want to represent.
l The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
l Representations of facts is some chosen formalism.
l The relational knowledge corresponds to a set of attributes and associated values that together describe the objects of the knowledge base.
l It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired.
l One of the most useful forms of inference is property inheritance, in which element of specific classes inherit attributes and values from more general classes in which they are included.
l In order to support property inheritance, objects must be organized into classes and classes must be arranged in a generalization hierarchy.
 
Thanks

Total Pageviews