Sunday, November 11, 2007
Artificial intelligence Section B
Knowledge Representation
Learning Objectives
After reading this unit you should appreciate the following:
Representation and Mapping
Approaches to Knowledge Representation
The Frame Problem
We had discussed the role that knowledge plays in AI systems in earlier units. In the later unit we have paid little attention to knowledge and its importance as we had focused on basic frameworks for building search-based problem-solving programs. These methods are sufficiently general that we have been able to discuss them without reference to how the knowledge they need is to be represented. Although these methods are useful and form the skeleton of many of the methods we are about to discuss, their problem-solving power is limited precisely because of their generality. As we look in more detail at ways of representing knowledge, it becomes clear that particular knowledge representation models allow for more specific, more powerful problem-solving mechanisms that operate on them. In this unit, we return to the topic of knowledge used within the programs and examine specific techniques that can be used for representing and manipulating knowledge.
Top
Representations and Mappings
Artificial Intelligence needs both a large amount of knowledge and some mechanisms for manipulating that knowledge to create solutions to new problems. A variety of ways of representing knowledge have been exploited in AI programs. But before we can talk about them individually, we must consider the following point that pertains to all discussions of representation, namely that we are dealing with two different kinds of entities:
Facts: truths in some relevant world. These are the things we want to represent.
Representations of facts in some chosen formalism. These are the things we will actually be able to manipulate.
Figure 3.1: Mappings between Facts and Representations
One way to think of structuring these entities is as two levels:
The knowledge level, at which facts (including each agent's behaviors and current goals) are described.
The symbol level, at which representations of objects at the knowledge level are defined in terms of symbols that can be manipulated by programs.
Rather than thinking of one level on top, of another, we will focus on facts, on representations, and on the two-way mappings that must exist between them as shown in Figure 3.1. We will call these links representation mappings. The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
One representation of facts is so common that it deserves special mention: natural language sentences. Regardless of the representation for facts that we use in a program, we may also need to be concerned with an English representation of those facts in order to facilitate getting information into and out of the system. In this case, we must also have mapping functions from English sentences to the representation we are actually going to use and from it back to sentences. Figure 3.1 shows how these three kinds of objects relate to each other.
Let's look at a simple example using mathematical logic as the representational formalism. Consider the English sentence:
Tom is a cat.
The fact represented by this English sentence can also be represented in logic as:
Cat (Tom)
Suppose that we also have a logical representation of the fact that all cats have tails:
"x : cats (x)à hastail(x)
Then, using the deductive mechanisms of logic, we may generate the new representation object:
hastail(Tom)
Using an appropriate backward mapping function, we could then generate the English sentence:
Tom has a tail.
Or we could make use of this representation of a new fact to cause us to take some appropriate action or to derive representations of additional facts.
It is important to keep in mind that usually the available mapping functions are not one-to-one. In fact, they are often not even functions but rather many-to-many relations. (In other words, each object in the domain may map to several elements in the range, and several elements in the domain may map to the same element of the range.) This is particularly true of the mappings involving English representations of facts. For example, the two sentences "All cats have tails" and "Every cat has a tail" could both represent the same fact, namely that every cat has at least one tail. On the other hand, the former could represent either the fact that every cat has at least one tail or the fact that each cat has several tails. The latter may represent either the fact that every cat has at least one tail or the fact that there is a tail that every cat has. As we will see shortly, when we try to convert English sentences into some other representation, such as logical propositions, we must first decide what facts the sentences represent and then convert those facts into the new representation.
The starred links of Figure 3.1 are key components of the design of any knowledge-based program. Sometimes, a good representation makes the operation of a reasoning program not only correct but also trivial. A well-known example of this occurs in the context of the mutilated checkerboard problem, which can be stated as follows:
The Mutilated Checkerboard Problem. Consider a normal checkerboard from which two squares; in opposite corners have been removed. The task is to cover all the remaining squares exactly with dominoes, each of which covers two squares. No overlapping, either of dominoes on top of each other or of dominoes over the boundary of the mutilated board is allowed. Can this task, be done?
To solve this problem we can enumerate all possible tilings to see if one works. But suppose one wants to be cleverer. Figure 3.2 shows three ways in which the mutilated checkerboard could be represented (to a person). The first representation does not directly suggest the answer to the problem. The second may; the third does when combined with the single additional fact that each domino must cover exactly one while square and one black square. Even for human problem solvers a representation shift may make an enormous difference in problem-solving effectiveness.
Figure 3.2: Three Representations of a Mutilated Checkerboard
The mapping between the facts and representations which was shown in Figure 3.1 has been expanded in Figure 3.3. The dotted line across the top represents the abstract reasoning process that a program is intended to model. The solid line across the bottom represents the concrete reasoning process that a particular program performs. This program successfully models the abstract process to the extent that, when the backward representation mapping is applied to the program's output, the appropriate final facts are actually generated. If either the program’s operation or one of the representation mappings is not faithful to the problem that is being modeled, then the final facts will probably not be the desired ones. The key role that is played by the nature of the representation mapping is apparent from this figure. If no good mapping can be defined for a problem, then no matter how good the program to solve the problem is, it will not be able to produce answers that correspond to real answers to the problem.
Figure 3.3 looks very much like the figure that might appear in a general programming book as a description of the relationship between an abstract data type (such as a set) and a concrete implementation of that type (e.g. as a linked list of elements). But there are some differences. For example, in data type design it is expected that the mapping that we are calling the backward representation mapping is a function (i.e., every representation corresponds to only one fact) and that it is onto (i.e., there is at least one representation for every fact). Unfortunately, in many AI domains, it may not be possible to come up with such a representation mapping, and we may have to live with one that gives less ideal results. But the main idea of what we are doing is the same as what programmers always do, namely to find concrete implementations of abstract concepts.
Figure 3.3: Representation of Fact
Before reading next section, answer the following questions.
1. Discuss different ways of knowledge representation.
Represent the following facts in logic:
John plays football.
Jim loves Jani.
Caesar was a ruler.
If your answers are correct, then proceed to the next section.
Approaches to Knowledge Representation
A good system for the representation of knowledge in a particular domain should possess the following four properties:
Representational Adequacy – the ability to represent all of the kinds of knowledge that are needed in that domain.
Inferential Adequacy-the ability to manipulate the representational structures in such a way as to derive new structures corresponding to new knowledge inferred from old.
Inferential Efficiency-the ability to incorporate into the knowledge structure additional information that can be used to focus the attention of the Inference mechanisms in the most promising directions.
Acquisitional Efficiency-the ability to acquire new information easily. The simplest case involves direct insertion, by a person, of new knowledge into the database. Ideally, the program itself would be able to control knowledge acquisition.
Multiple techniques for knowledge representation exist because unfortunately, no single system that optimizes all of the capabilities for all kinds of knowledge has yet been found. Many programs rely on more than one technique. As we proceed further the most important of these techniques are described in detail. But in this section, we provide a simple, example-based introduction to the important ideas.
Simple Relational Knowledge
The simplest way to represent declarative facts is as a set of relations of the same sort, used in database systems. Figure 3.4 shows an example of such a relational system.
Figure 3.4: Simple Relational Knowledge
The reason that this representation is simple is that standing alone it provides very weak inferential capabilities. But knowledge represented in this form may serve as the input to more powerful inference engines. For example, given just the facts of Figure 3.4, it is not possible even to answer the simple question, "Who is the heaviest player?" But if a procedure for finding the heaviest player is provided, then these facts will enable the procedure to compute an answer. If, instead, we are provided with a set of rules for deciding which hitter to put up against a given pitcher (based on right- and left- handedness, say), then this same relation can provide at least some of the information required by those rules.
Providing support for relational knowledge is what database systems are designed to do. Thus we do not need to discuss this kind of knowledge representation structure further here. The practical issues that arise in linking a database system that provides this kind of support to a knowledge representation system that provides some of the other capabilities that we are about to discuss have already been solved in several commercial products.
Inheritable Knowledge
It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired. One of the most useful forms of inference is property inheritance, in which elements of specific classes inherit attributes and values from more general classes in which they are included.
Objects must be organized into classes and classes must be arranged in a generalization hierarchy in order to support property inheritance. Figure 3.5 shows some additional baseball knowledge inserted into a structure that is so arranged. Lines represent attributes. Boxed nodes represent objects and values of attributes of objects. These values can also be viewed as objects with attributes and values, and so on. The arrows on the lines point from an object to its value along the corresponding attribute line. The structure shown in the figure is a slot-and-filler structure. It may also be called a semantic network or a collection of frames. In the latter case, each individual frame represents the collection of attributes and values associated with a particular node. Figure 3.6 shows the node for baseball player displayed as a frame.
Figure 3.5: Inheritable Knowledge
Usually the use of the term frame system implies somewhat more structure on the attributes and the inference mechanisms that are available to apply, to them than does the term semantic network.
All of the objects and most of the attributes shown in this example have been chosen to correspond to the baseball domain, and they have no general significance. The two exceptions to
Figure 3.6: Viewing a Node as a Frame
this are the attribute isa, which is being used to show class inclusion, and the attribute instance, which is being used to show, class membership. These two specific (and generally useful) attributes provide the basis for property inheritance as an inference technique. Using this technique, the knowledge base can support retrieval both of facts that have been explicitly stored and of facts that can be derived from those that are explicitly stored. An idealized form of the property inheritance algorithm can be stated as follows:
Algorithm: Property Inheritance
To retrieve a value V for attribute A of an instance object O:
1. Find O in the knowledge base.
2. If there is a value there for the attribute A, report that value.
3. Otherwise, see if there is a value for the attribute instance. If not, then fail.
4. Otherwise, move to the node corresponding to that value and look for a value for the attribute A. If one is found, report it.
5. Otherwise, do until there is no value for the isa attribute or until an answer is found:
a. Get the value of the isa attribute and move to that node.
b. See if there is a value for the attribute A. If there is, report it.
This procedure is simplistic. It does not say what we should do if there is more than one value of the instance or isa attribute. But it does describe the basic mechanism of inheritance. We can apply this procedure to our example knowledge base to derive answers to the following queries:
team(Pee-Wee-Reese) = Brooklyn-Dodgers. This attribute had a value stored explicitly in the knowledge base.
batting-average(Three-Finger-Brown) = .106. Since there is no value for batting average stored explicitly for Three Finger Brown, we follow the instance attribute to Pitcher and extract the value stored there. Now we observe one of the critical characteristics of property inheritance, namely that it may produce default values that are not guaranteed to be correct but that represent "best guesses" in the face of a lack of more precise information. In fact, in 1906, Brown's batting average was .204.
height(Pee-Wee-Reese) = 6-1. This represents another default inference. Notice here that because we get to it first, the more specific fact about the height of baseball players overrides a more general fact about the height of adult males.
bats(Three-Finger-Brown) = Right. To get a value for the attribute bats required going up the isa hierarchy to the class Baseball-Player. But what we found there was not a value but a rule for computing a value. This rule required another value (that for handed) as input. So the entire process must be begun again recursively to find a value for handed. This time, it is necessary to go all the way up to Person to discover that the default value for handedness for people is Right. Now the rule for bats can be applied, producing the result Right. In this case, that turns out to be wrong, since Brown is a switch hitter (i.e., he can hit both left- and right-handed).
Inferential Knowledge
Figure 3.7 shows two examples of the use of first-order predicate logic to represent additional knowledge about baseball.
Figure 3.7: Inferential Knowledge
Of course, this knowledge is useless unless there is also an inference procedure that can exploit it The required inference procedure now is one that implements the standard logical rules of inference. There are many such procedures, some of which reason forward from given facts to conclusions, others of which reason backward from desired conclusions to given facts. One of the most commonly used of these procedures is resolution, which exploits a proof by contradiction strategy.
In general, in fact, all of the techniques we are describing here should not be regarded as complete and incompatible, ways of representing knowledge. Instead, they should be viewed as building blocks of a complete representational system.
Procedural Knowledge
Another, equally useful, kind of knowledge is operational, or procedural knowledge, that specifies what to do when. Procedural knowledge can be represented in programs in many ways. The most common way is simply as code (in some programming language such as LISP) for doing something. The machine uses the knowledge when it executes the code to perform a task. Unfortunately, this way of representing procedural knowledge gets low scores with respect to the properties of inferential adequacy (because it is very difficult to write a program that can reason about another program's behaviour) and acquisitional efficiency (because the process of updating and debugging large pieces of code becomes unwieldy).
Figure 3.8: Using LISP Code to Define a Value
As an extreme example, compare the representation of the way to compute the value, of bats shown in Figure 3.6 to one in LISP shown in Figure 3.8. Although the LISP one will work given a particular way of storing attributes and values in a list, it does not lend itself to being reasoned about in the same straightforward way as the representation of Figure 3.6 does. The LISP representation is slightly more powerful since it makes explicit use of the name of the node whose value for handed is to be found. But if this matters, the simpler representation can be augmented to do this as well.
Because of this difficulty in reasoning with LISP, attempts have been made to find other ways of representing procedural knowledge so that it can relatively easily be manipulated both by other programs and by people.
The most commonly used technique for representing procedural knowledge in AI programs is the use of production rules. Figure 3.9 shows an example of a production rule that represents a piece of operational knowledge typically possessed by a baseball player.
Production rules, particularly ones that are augmented with information on how they are to be used, are more procedural than are the other representation methods discussed in this chapter. But making a clean distinction between declarative and procedural knowledge is difficult. Although at an intuitive level such a distinction makes some sense, at a formal level it disappears. In fact, as you can see the structure of the declarative knowledge of Figure 3. 7 is not substantially different from that of the operational knowledge of Figure 3.9. The important difference is in how the knowledge is used by the procedures that manipulate it.
Figure 3.9: Procedural Knowledge as Rules
Top
Issues in Knowledge Representation
Before embarking on a discussion of specific mechanisms that have been used to represent various kinds of real-world knowledge, we need briefly to discuss several issues that cut across all of them:
Are any attributes of objects so basic that they occur in almost every problem domain? If there are, we need to make sure that they are handled appropriately in each of the mechanisms we propose. If such attributes exist, what are they?
Are there any important relationships that exist among attributes of objects?
At what level should knowledge be represented? Is there a good set of primitives into which all knowledge can be broken down? Is it helpful to use such primitives?
How should sets of objects be represented?
Given a large amount of knowledge stored in a database, how can relevant parts be accessed when they are needed?
We will talk about each of these questions briefly in the next five sections.
Important Attributes
Instance and isa are two attributes that are of very general significance. These attributes are important because they support property inheritance. They are called a variety of things in AI systems, but the names do not matter. What does matter is that they represent class membership and class inclusion and that class inclusion is transitive. In slot-and-filler systems, these attributes are usually represented explicitly in a way much like that shown in Figures 3.5 and 3.6. In logic-based systems, these relationships may be represented this way or they may be represented implicitly by a set of predicates describing particular classes.
Relationships among Attributes
The attributes that we use to describe objects are themselves entities that we represent. What properties do they have independent of the specific knowledge they encode? There are four such properties that deserve mention here:
Inverses
Existence in an isa hierarchy
Techniques for reasoning about values
Single-valued attributes
Inverses
Entities in the world are related to each other in many different ways. But as soon as we decide to describe those relationships as attributes, we commit to a perspective in which we focus on one object and look for binary relationships between it and others. Attributes are those relationships. So, for example, in Figure 3.5, we used the attributes instance, isa, and team. Each of these was shown in the figure with a directed arrow, originating at the object that was being described and terminating at the object representing the value of the specified attribute. But we could equally well have focused on the object representing the value. If we do that, then there is still a relationship between the two entities, although it is a different one since the original relationship was not symmetric (although some relationships, such as sibling are). In many cases, it is important to represent this other view of relationships. There are two good ways to do this.
The first is to represent both relationships in a single representation that ignores focus. Logical representations are usually interpreted as doing this. For example, the assertion:
team(Pee-Wee-Reese, Brooklyn-Dodgers)
can equally easily be interpreted as a statement about Pee Wee Reese or about the Brooklyn Dodgers. How it is actually used depends on the other assertions that a system contains.
The second approach is to use attributes that focus on a single entity but to use them in pairs, one the inverse of the other. In this approach, we would represent the team information with two attributes:
one associated with Pee Wee Reese
team = Brooklyn-Dodgers
one associated with Brooklyn Dodgers
team-members = Pee-Wee-Reese,…….
This is the approach that is taken in semantic net and frame based systems. When it is used, it is usually accompanied by a knowledge acquisition tool that guarantees the consistency of inverse slots by forcing them to be declared, and then checking each time a value is added to one attribute that the corresponding value is added to the inverse.
An Isa Hierarchy of Attributes
Further attributes and specialization of attributes are there just as there are classes of objects and specialized subsets of those classes. Consider, for example, the attribute height. It is actually a specialization of the more general attribute physical-size which is, in, turn, a specialization of physical-attribute. These generalization-specialization relationships are important for attributes for the same reason that they are important for other concepts i.e. they support inheritance. In the case of attributes, they support inheriting information about such things as constraints on the values that the attribute can have and mechanisms for computing those values.
Techniques for Reasoning about Values
Sometimes values of attributes are specified explicitly when a knowledge base is created. We saw several examples of that in the baseball example of Figure 3.5. But often the reasoning system must reason about values it has not been given explicitly. Several kinds of information can play a role in this reasoning, including:
Information about the type of the value. For example, the value of height must be a number measured in a unit of length.
Constraints on the value often stated in terms of related entities. For example, the age of a person cannot be greater than the age of either of that person's parents.
Rules for computing the value when it is needed. We showed an example of such a rule in Figure 3.5 for the bats attribute. These rules are called backward rules. Such rules have also been called if-needed rules.
Rules that describe actions that should be taken, if a value ever becomes known. These rules are called forward rules, or sometimes if-added rules.
Single-Valued Attributes
A unique value is taken up by specific but very useful kind of attribute. For example, a baseball player can, at anyone time, have only a single height and be a member of only one team. If there is already a value present for one of these attributes and a different value is asserted, then one of two things has happened. Either a change has occurred in the world or there is now a contradiction in the knowledge base that needs to be resolved. Knowledge-representation systems have taken several different approaches to provide support for single-valued attributes, including:
Introduce an explicit notation for temporal interval. If two different values are ever asserted for the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is asserted, replace the old value.
Provide no explicit support. Logic-based systems are in this category. But in these systems, knowledge-base builders can add axioms that state that if an attribute has one value then it is known not to have all other values.
Choosing the Granularity of Representation
It is necessary to answer the question “At what level of detail should the world be represented?”, irrespective of the particular formalism. Should there be a small number of low-level ones or should there be a larger number covering a range of granularities? A brief example illustrates the problem. Suppose we are interested in the following fact:
John spotted Sue.
We could represent this as
spotted(agent(John) ,
object(Sue))
Such a representation would make it easy to answer questions such as:
Who spotted Sue?
But now suppose we want to know:
John see Sue?
The obvious answer is "yes," but given only the one fact we have, we cannot discover that answer. We could, of course, add other facts, such as
spotted(x,y) à saw(x,y) ,
We could then infer the answer to the question.
An alternative solution to this problem is to represent the fact that spotting is really a special type of seeing explicitly in the representation of the fact. We might write: something such as
saw(agent(John),
object(Sue) ,
timespan(briefty))
Here we have broken the idea of spotting apart into more primitive concepts of seeing and timespan. Using this representation, the fact that John saw Sue is immediately accessible. But the fact that he spotted her is more difficult to get to. The major advantage of converting all statements into a representation in terms of a small set of primitives is that the rules that are used to derive inferences from that knowledge need be written only in terms of the primitives rather than in terms of the many ways in which the knowledge may originally have appeared. Thus what is really being argued for is simply some sort of canonical form.
One of the several arguments which go against the use of low-level primitives is that simple high-level facts may require a lot of storage when broken down into primitives. Much of that storage is really wasted since the low-level rendition of a particular high-level concept will appear many times, once for each time the high-level concept is referenced.
Most of these limitations can be overcome by using another strategy for finding the structure and the meaning of a sentence in one step, called Conceptual Parsing. Conceptual parsing, like semantic grammars, is a strategy for finding both the structure and the meaning of a sentence in one step. Conceptual parsing is driven by a dictionary that describes the meanings of words as conceptual dependency (CD) structures. The first step in mapping a sentence into its CD representation involves a syntactic processor that extracts the main noun and verb. It also determines the syntactic category and aspectual class of the verb (i.e., stative, transitive, or intransitive). The conceptual processor then takes over. It makes use of verb-ACT dictionary, which contains an entry for each environment in which a verb can appear. For example, suppose that actions are being represented as combinations of a small set of primitive actions. Then the fact that John punched Mary might be represented as shown in Figure 3.10(a). The representation says that there was physical contact between John's fist and Mary. The contact was caused by John propelling his fist toward Mary, and in order to do that John first went to where Mary was? But suppose we also know that Mary punched John. Then we must also store the structure shown in Figure 3.10(b). If, however, punching were represented simply as punching, then most of the detail of both structures could be omitted from the structures themselves. It could instead be stored just once in a common representation of the concept of punching.
A second but related problem is that substantial work must be done to reduce the knowledge into primitive form. if knowledge is initially presented to the system in a relatively high-level form, such as English. Both in understanding language and in interpreting the world that we see, many things appear that later turn out to be irrelevant. For the sake of efficiency, it may be desirable to store these things at a very high level and then to analyze in detail only those inputs that appear to be important.
A third problem with the use of low-level primitives is that in many domains, it is not at all clear what the primitives should be. And even in domains in which there may be an obvious set of primitives, there may not be enough information present in each use of the high-level constructs to enable them to be converted into their primitive components. When this is true, there is no way to avoid representing facts at a variety of granularities.
There exists at least one obvious set of primitives: mother, father, son, daughter, and possibly brother and sister. But now suppose we are told that Mary is Sue's cousin. An attempt to describe the cousin relationship in terms of the primitives could produce any of the following interpretations:
Mary = daughter(brother(mother(Sue)))
Mary = daughter(sister(mother(Sue)))
Mary = daughter(brother(father(Sue)))
Mary = daughter(sister(father(Sue)))
Figure 3.10: Redundant representations
As illustrated, the problem of choosing the correct granularity of representation for a particular body of knowledge is not easy. Clearly, the lower the level we choose, the less inference required to reason with it in some cases, but the more inference required to create the representation from English and the more room it takes to store, since many inferences will be represented many times. The answer for any particular task domain must come to a large extent from the domain itself-to what use is the knowledge to be put?
One way of looking at the question of whether there exists a good set of low-level primitives is that it is a question of the existence of a unique representation. Does there exist a single, canonical way in which large bodies of knowledge can be represented independently of how they were originally stated? Another, closely related, uniqueness question asks whether individual objects can be represented uniquely and independently of how they are described.
The phrase Evening Star names a certain large physical object of spherical form, which is hurtling through space some scores of millions of miles from here. The phrase Morning Star names the same thing, as was probably first established by some observant Babylonian. But the two phrases cannot be regarded as having the same meaning, otherwise that Babytonian could have dispensed with his observations and contented himself with reflecting on the meaning of his words. The meanings, then, being different from one another, must be other than the named object, which is one and the same in both cases. For a program to be able to reason as did the Babylonian, it must be able to handle several distinct representations that turn out to stand for the same object.
Representing Sets of Objects
There are several reasons to satisfy why we must represent set of objects. One is that there are some properties that are true of sets that are not true of the individual members of a set. As examples, consider the assertions that are being made in the sentences "There are more sheep than people in Australia" and "English speakers can be found all over the world." The only way to represent the facts described in these sentences is to attach assertions to the sets representing people, sheep, and English speakers, since, for example, no single English speaker can be found all over the world.
Secondly if a property is true of all (or even most) elements of a set, then it is more efficient to associate it once with the set rather than to associate it explicitly with every element of the set. We have already looked at ways of doing that, both in logical representations through the use of the universal quantifier and in slot-and-filler structures, where we used nodes to represent sets and inheritance to propagate set-level assertions down to individuals. As we consider ways to represent sets, we will want to consider both of these uses of set-level representations. We will also need to remember that the two uses must be kept distinct. Thus if we assert something like large(Elephant), it must be clear whether we are asserting some property of the set itself or some property that holds for individual elements of the set.
The simplest way in which a set may be represented is just by a name. This simple representation does make it possible to associate predicates with sets. But it does not, by itself, provide any information about the set it represents. It does not, for example, tell how to determine whether a particular object is a member of the set or not.
There are two ways to state a definition of a set and its elements. The first is to list the members. Such a specification is called an extensional definition. The second is to provide a rule that, when a particular object is evaluated, return true or false depending on whether the object is in the set or not. Such a rule is called an intentional definition. For example, an extensional description of the set of our sun's planets on which people live is {Earth}. An intentional description is
{x: sun-planet(x) Ù human-inhabited(x)}
For simple sets, it may not matter, except possibly with respect to efficiency concerns, which representation is used. But the two kinds of representations can function differently in some cases.
One way in which extensional and intentional representations differ is that they do, not necessarily correspond one-to-one with each other. For example, the extensionally defined set {Earth} has many intentional definitions in addition to the one we just gave. Others include:
{x : sun-planet(x) Ù nth-farthest-from-sun(x, 3)}
{x: sun-planet(x) Ù nth-biggest(x, 5}}
Thus, while it is trivial to determine whether two sets are identical if extensional descriptions are used, it may be very difficult to do so using intentional descriptions. Intentional representations have two important properties that extensional ones lack, however. The first is that they can be used to describe infinite sets and sets not all of whose elements are explicitly known. Thus we can describe intentionally such sets as prime numbers (of which there are infinitely many) or kings of England (even though we do not know who all of them are or even how many of them there have been). The second thing we can do with intentional descriptions is to allow them to depend on parameters that can change, such as time or spatial location. If we do that, then the actual set that is represented by the description will change as a function of the value of those parameters. To see the effect of this, consider the sentence, "The president of the United States used to be a Democrat," uttered when the current president is a Republican. This sentence can mean two things. The first is that the specific person who is now president was once a Democrat. This meaning can be captured straightforwardly with an extensional representation of "the president of the United States." We just specify the individual. But there is a second meaning, namely that there was once someone who was the president and who was a Democrat. To represent the meaning of "the president of the United States" given this interpretation requires an intentional description that depends on time. Thus we might write president(t), where president is some function that maps instances of time onto instances of people, namely U.S. presidents.
Finding the Right Structures as Needed
Suppose we have a script (a description of a class of events in terms of contexts, participants, and subevents) that describes the typical sequence of events in a restaurant. This script would enable us to take a text such as
John went to Steak and Ale last night. He ordered a large rare steak, paid his bill and left and answer "yes" to the question:
Did John eat dinner last night?
One thing we must notice that nowhere in the story it was mentioned explicitly that John's eating anything . But the fact that when one goes to a restaurant, one eats, will be contained in the restaurant script. If we know in advance to use the restaurant script then we can answer the question easily. But in order to be able to reason about a variety of things, a system must have many scripts for everything from going to work to sailing around the world. How will it select the appropriate one each time? For example, nowhere in our story was the word "restaurant” mentioned.
In fact, in order to have access to the right structure for describing a particular situation, it is necessary to solve all of the following problems.
How to perform an initial selection of the most appropriate structure.
How to fill in appropriate details from the current situation.
How to find a better structure if the one chosen initially turns out not to be appropriate? What to do if none of the available structures is appropriate?
When to create and remember a new structure.
There is no good general purpose method for solving all these problems. Some knowledge-representation techniques solve some of them. In this section we survey some solutions to two of these problems: how to select an initial structure to consider and how to find a better structure if that one turns out not to be a good match.
Selecting an Initial Structure
Selecting candidate knowledge structures to match a particular problem-solving situation is a hard problem; there are several ways in which it can be done. Three important approaches are the following:
1. Index the structures directly by the significant English words that can be used to describe them. For example, let each verb have associated with it a structure that describes its meaning. This is the approach taken in conceptual dependency theory. Even for selecting simple structures, such as those representing the meanings of individual words, though, this approach may not be adequate, since many words may have several distinct meanings. For example the word "fly" has a different meaning in each of the following sentences:
- John flew to New York. (He rode in a plane from one place to another.)
- John flew a kite. (He held a kite that was up in the air.)
- John flew down the street. (He moved very rapidly.)
- John flew into a rage. (An idiom)
Another problem with this approach is that it is only useful when there is an English description of the problem to be solved.
2. Consider each major concept as a pointer to all of the structures (such as scripts) in which it might be involved. This may produce several sets of prospective structures. For example, the concept Steak might point to two scripts, one for restaurant and one for supermarket. The concept Bill might point to a restaurant and a shopping script. Take the intersection of those sets to get the structure(s), preferably precisely one, that involves all the content words. Given the pointers just described and the story about John's trip to Steak and Ale, the restaurant script would be evoked. One important problem with this method is that if the problem description contains any even slightly extraneous concepts, then the intersection of their associated structures will be empty. This might occur if we had said, for example, "John rode his bicycle to Steak and Ale last night." Another problem is that it may require a great deal of computation to compute all of the possibility sets and then to intersect them. However, if computing such sets and intersecting them could be done in parallel, then the time required to produce an answer would be reasonable even if the total number of computations is large.
3. Locate one major clue in the problem description and use it to select an initial structure. As other clues appear, use them to refine the initial selection or to make a completely new one if necessary. The major problem with this method is that in some situations there is not an easily identifiable major clue. A second problem is that it is necessary to anticipate which clues are going to be important and which are not. But the relative importance of clues can change dramatically from one situation to another. For example, in many contexts, the colour of the objects involved is not important. But if we are told “The light turned red," then the colour of the light is the most important feature to consider.
Unfortunately none of these proposals seems to be the complete answer to the problem. If at all we get one of the more complex the knowledge structures, the harder it is, to tell when a particular one is appropriate.
Revising the Choice When Necessary
Depending on the representation we are using, the details of the matching process will vary. It may require variables to be bound to objects. It may require attributes to have their values compared. In any case, if values that satisfy the required restrictions as imposed by the knowledge structure can be found, they are put into the appropriate places in the structure. If no appropriate values can be found, then a new structure must be selected. The way in which the attempt to instantiate this first structure failed may provide useful cues as to which one to try next. If, on the other hand, appropriate values can be found, then the current structure can be taken to be appropriate for describing the current situation. But, of course, that situation may change. Then information about what happened (for example, we walked around the room we were looking at) may be useful in selecting a new structure to describe the revised situation.
When the process runs into a snag, though, it is often not necessary to abandon the effort and start over. Rather, there are a variety of things that can be done:
Select the fragments of the current structure that do correspond to the situation and match them against candidate alternatives. Choose the best match. If the current structure was at all close to being appropriate, much of the work that has been done to build substructures to fit into it will be preserved.
Make an excuse for the current structure's failure and continue to use it. For example, a proposed chair with only three legs might simply be broken. Or there might be another object in front of it which occludes one leg part of the structure should contain information about the features for which it is acceptable to make excuses. Also, there are general heuristics, such as the fact that a structure is more likely to be appropriate if a desired feature is missing (perhaps because it is hidden from view) than if an inappropriate feature is present. For example, a person with one leg is more plausible than a person with a tail.
Refer to specific stored links between structures to suggest new directions in which to explore. An example of this sort of linking among a set of frames is shown in the similarity network shown in Figure 3.11.
If the knowledge structures are stored in an isa hierarchy, traverse upward in it until a structure is found that is sufficiently general that it does not conflict with the evidence. Either use this structure if it is specific enough to provide the required knowledge or consider creating a new structure just below the matching one.
Student Activity 3.2
Before reading next section, answer the following questions.
1. Write short notes on inferential knowledge and procedure knowledge.
2. What are the main issues in knowledge representation?
3. What is the use of “instance” and “isa” attributes?
If your answers are correct, then proceed to the next section.
Top
The Frame Problem
We have seen several methods for representing knowledge that would allow us to form complex state descriptions for a search program. Now next concern is how to represent efficiently sequences of problem states that arise from a search process. For complex ill-structured problems, this can be a serious matter.
Figure 3.11: A Similarity Net
Consider the world of a household robot. There are many objects and relationships in the world, and a state description must somehow include facts like on (Plant 12, Table34), under(Table34, Window'13), and in(Table34, RoomI5). One strategy is to store each state description as a list of such facts. Most of the facts will not change from one state to another, yet each fact will be represented once at every node, and we will quickly run out of memory. Furthermore, we will spend the majority of our time creating these nodes and copying these facts-most of which do not change often-from one node to another. For example, in the robot world, we could spend a lot of time recording above(Ceiling, Floor) at every node. In addition to the real problem of figuring out which facts should be different at each node.
This whole problem of representing the facts that change as well as those that do not is known as the frame problem. In some domains, the only hard part is representing all the facts. In others, though, figuring out which ones change is nontrivial. For example, in the robot world, there might be a table with a plant, on it under the window. Suppose we move the table to the center of the room. We must also infer that the plant is now in the center of the room too but that the window is not.
To support this kind of reasoning, some systems make use of an explicit set of axioms called frame axioms, which describe all the things that do not change when a particular operator is applied in state n to produce state n+1. Thus in the robot domain, we might write axioms such as,
Color(x,y,s1) Ù move(x,S1,S2)à color(x,y,s2)
which can be read as, "If x has colour y in state s1 and the operation of moving x is applied in state s1 to produce state s2, then the colour of x in S2 is still y. Unfortunately, in any complex domain, a huge number of these axioms becomes necessary. An alternative approach is to make the assumption that the only things that change are the things that must. By "must" here we mean that the change is either required explicitly by the axioms that describe the operator or that it follows logically from some change that is asserted explicitly. This idea of circumscribing the set of unusual things is a very powerful one, it can be used as a partial solution to the frame problem and as a way of reasoning with incomplete knowledge.
Briefly returning back to the problem of representing a changing problem state. Simply starting with a description of the initial state can do that and then making changes to that description as indicated by the rules we apply. This solves the problem of the wasted space and time involved in copying the information for each node. And it works fine until the first time the search has to backtrack. Then, unless all the changes that were made can simply be ignored, we are faced with the problem of backing up to some earlier node. But how do we know what changes in the problem state description need to be undone? There are two ways this problem can be solved.
l Do not modify the initial state description at all. At each node, store an indication of the specific changes that should be made at this node. Whenever it is necessary to refer to the description of the current problem state, look at the initial state description and also look back through all the nodes on the path from the start state to the current state. This approach makes backtracking very easy, but it makes referring to the state description fairly complex.
l Modify the initial state description as appropriate, but also record at each node an indication of what to do to undo the move should it ever be necessary to backtrack through the node. Then, whenever it is necessary to backtrack, check each node along the way and perform the indicated operations on the state description.
Answer the following questions.
1. What do you understand by a frame? How is it useful in knowledge representation?
2. What are Frame Axioms, where is it used?
Summary
l Facts: truths in some relevant world. These are the things we want to represent.
l The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
l Representations of facts is some chosen formalism.
l The relational knowledge corresponds to a set of attributes and associated values that together describe the objects of the knowledge base.
l It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired.
l One of the most useful forms of inference is property inheritance, in which element of specific classes inherit attributes and values from more general classes in which they are included.
l In order to support property inheritance, objects must be organized into classes and classes must be arranged in a generalization hierarchy.
Learning Objectives
After reading this unit you should appreciate the following:
Representation and Mapping
Approaches to Knowledge Representation
The Frame Problem
We had discussed the role that knowledge plays in AI systems in earlier units. In the later unit we have paid little attention to knowledge and its importance as we had focused on basic frameworks for building search-based problem-solving programs. These methods are sufficiently general that we have been able to discuss them without reference to how the knowledge they need is to be represented. Although these methods are useful and form the skeleton of many of the methods we are about to discuss, their problem-solving power is limited precisely because of their generality. As we look in more detail at ways of representing knowledge, it becomes clear that particular knowledge representation models allow for more specific, more powerful problem-solving mechanisms that operate on them. In this unit, we return to the topic of knowledge used within the programs and examine specific techniques that can be used for representing and manipulating knowledge.
Top
Representations and Mappings
Artificial Intelligence needs both a large amount of knowledge and some mechanisms for manipulating that knowledge to create solutions to new problems. A variety of ways of representing knowledge have been exploited in AI programs. But before we can talk about them individually, we must consider the following point that pertains to all discussions of representation, namely that we are dealing with two different kinds of entities:
Facts: truths in some relevant world. These are the things we want to represent.
Representations of facts in some chosen formalism. These are the things we will actually be able to manipulate.
Figure 3.1: Mappings between Facts and Representations
One way to think of structuring these entities is as two levels:
The knowledge level, at which facts (including each agent's behaviors and current goals) are described.
The symbol level, at which representations of objects at the knowledge level are defined in terms of symbols that can be manipulated by programs.
Rather than thinking of one level on top, of another, we will focus on facts, on representations, and on the two-way mappings that must exist between them as shown in Figure 3.1. We will call these links representation mappings. The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
One representation of facts is so common that it deserves special mention: natural language sentences. Regardless of the representation for facts that we use in a program, we may also need to be concerned with an English representation of those facts in order to facilitate getting information into and out of the system. In this case, we must also have mapping functions from English sentences to the representation we are actually going to use and from it back to sentences. Figure 3.1 shows how these three kinds of objects relate to each other.
Let's look at a simple example using mathematical logic as the representational formalism. Consider the English sentence:
Tom is a cat.
The fact represented by this English sentence can also be represented in logic as:
Cat (Tom)
Suppose that we also have a logical representation of the fact that all cats have tails:
"x : cats (x)à hastail(x)
Then, using the deductive mechanisms of logic, we may generate the new representation object:
hastail(Tom)
Using an appropriate backward mapping function, we could then generate the English sentence:
Tom has a tail.
Or we could make use of this representation of a new fact to cause us to take some appropriate action or to derive representations of additional facts.
It is important to keep in mind that usually the available mapping functions are not one-to-one. In fact, they are often not even functions but rather many-to-many relations. (In other words, each object in the domain may map to several elements in the range, and several elements in the domain may map to the same element of the range.) This is particularly true of the mappings involving English representations of facts. For example, the two sentences "All cats have tails" and "Every cat has a tail" could both represent the same fact, namely that every cat has at least one tail. On the other hand, the former could represent either the fact that every cat has at least one tail or the fact that each cat has several tails. The latter may represent either the fact that every cat has at least one tail or the fact that there is a tail that every cat has. As we will see shortly, when we try to convert English sentences into some other representation, such as logical propositions, we must first decide what facts the sentences represent and then convert those facts into the new representation.
The starred links of Figure 3.1 are key components of the design of any knowledge-based program. Sometimes, a good representation makes the operation of a reasoning program not only correct but also trivial. A well-known example of this occurs in the context of the mutilated checkerboard problem, which can be stated as follows:
The Mutilated Checkerboard Problem. Consider a normal checkerboard from which two squares; in opposite corners have been removed. The task is to cover all the remaining squares exactly with dominoes, each of which covers two squares. No overlapping, either of dominoes on top of each other or of dominoes over the boundary of the mutilated board is allowed. Can this task, be done?
To solve this problem we can enumerate all possible tilings to see if one works. But suppose one wants to be cleverer. Figure 3.2 shows three ways in which the mutilated checkerboard could be represented (to a person). The first representation does not directly suggest the answer to the problem. The second may; the third does when combined with the single additional fact that each domino must cover exactly one while square and one black square. Even for human problem solvers a representation shift may make an enormous difference in problem-solving effectiveness.
Figure 3.2: Three Representations of a Mutilated Checkerboard
The mapping between the facts and representations which was shown in Figure 3.1 has been expanded in Figure 3.3. The dotted line across the top represents the abstract reasoning process that a program is intended to model. The solid line across the bottom represents the concrete reasoning process that a particular program performs. This program successfully models the abstract process to the extent that, when the backward representation mapping is applied to the program's output, the appropriate final facts are actually generated. If either the program’s operation or one of the representation mappings is not faithful to the problem that is being modeled, then the final facts will probably not be the desired ones. The key role that is played by the nature of the representation mapping is apparent from this figure. If no good mapping can be defined for a problem, then no matter how good the program to solve the problem is, it will not be able to produce answers that correspond to real answers to the problem.
Figure 3.3 looks very much like the figure that might appear in a general programming book as a description of the relationship between an abstract data type (such as a set) and a concrete implementation of that type (e.g. as a linked list of elements). But there are some differences. For example, in data type design it is expected that the mapping that we are calling the backward representation mapping is a function (i.e., every representation corresponds to only one fact) and that it is onto (i.e., there is at least one representation for every fact). Unfortunately, in many AI domains, it may not be possible to come up with such a representation mapping, and we may have to live with one that gives less ideal results. But the main idea of what we are doing is the same as what programmers always do, namely to find concrete implementations of abstract concepts.
Figure 3.3: Representation of Fact
Before reading next section, answer the following questions.
1. Discuss different ways of knowledge representation.
Represent the following facts in logic:
John plays football.
Jim loves Jani.
Caesar was a ruler.
If your answers are correct, then proceed to the next section.
Approaches to Knowledge Representation
A good system for the representation of knowledge in a particular domain should possess the following four properties:
Representational Adequacy – the ability to represent all of the kinds of knowledge that are needed in that domain.
Inferential Adequacy-the ability to manipulate the representational structures in such a way as to derive new structures corresponding to new knowledge inferred from old.
Inferential Efficiency-the ability to incorporate into the knowledge structure additional information that can be used to focus the attention of the Inference mechanisms in the most promising directions.
Acquisitional Efficiency-the ability to acquire new information easily. The simplest case involves direct insertion, by a person, of new knowledge into the database. Ideally, the program itself would be able to control knowledge acquisition.
Multiple techniques for knowledge representation exist because unfortunately, no single system that optimizes all of the capabilities for all kinds of knowledge has yet been found. Many programs rely on more than one technique. As we proceed further the most important of these techniques are described in detail. But in this section, we provide a simple, example-based introduction to the important ideas.
Simple Relational Knowledge
The simplest way to represent declarative facts is as a set of relations of the same sort, used in database systems. Figure 3.4 shows an example of such a relational system.
Figure 3.4: Simple Relational Knowledge
The reason that this representation is simple is that standing alone it provides very weak inferential capabilities. But knowledge represented in this form may serve as the input to more powerful inference engines. For example, given just the facts of Figure 3.4, it is not possible even to answer the simple question, "Who is the heaviest player?" But if a procedure for finding the heaviest player is provided, then these facts will enable the procedure to compute an answer. If, instead, we are provided with a set of rules for deciding which hitter to put up against a given pitcher (based on right- and left- handedness, say), then this same relation can provide at least some of the information required by those rules.
Providing support for relational knowledge is what database systems are designed to do. Thus we do not need to discuss this kind of knowledge representation structure further here. The practical issues that arise in linking a database system that provides this kind of support to a knowledge representation system that provides some of the other capabilities that we are about to discuss have already been solved in several commercial products.
Inheritable Knowledge
It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired. One of the most useful forms of inference is property inheritance, in which elements of specific classes inherit attributes and values from more general classes in which they are included.
Objects must be organized into classes and classes must be arranged in a generalization hierarchy in order to support property inheritance. Figure 3.5 shows some additional baseball knowledge inserted into a structure that is so arranged. Lines represent attributes. Boxed nodes represent objects and values of attributes of objects. These values can also be viewed as objects with attributes and values, and so on. The arrows on the lines point from an object to its value along the corresponding attribute line. The structure shown in the figure is a slot-and-filler structure. It may also be called a semantic network or a collection of frames. In the latter case, each individual frame represents the collection of attributes and values associated with a particular node. Figure 3.6 shows the node for baseball player displayed as a frame.
Figure 3.5: Inheritable Knowledge
Usually the use of the term frame system implies somewhat more structure on the attributes and the inference mechanisms that are available to apply, to them than does the term semantic network.
All of the objects and most of the attributes shown in this example have been chosen to correspond to the baseball domain, and they have no general significance. The two exceptions to
Figure 3.6: Viewing a Node as a Frame
this are the attribute isa, which is being used to show class inclusion, and the attribute instance, which is being used to show, class membership. These two specific (and generally useful) attributes provide the basis for property inheritance as an inference technique. Using this technique, the knowledge base can support retrieval both of facts that have been explicitly stored and of facts that can be derived from those that are explicitly stored. An idealized form of the property inheritance algorithm can be stated as follows:
Algorithm: Property Inheritance
To retrieve a value V for attribute A of an instance object O:
1. Find O in the knowledge base.
2. If there is a value there for the attribute A, report that value.
3. Otherwise, see if there is a value for the attribute instance. If not, then fail.
4. Otherwise, move to the node corresponding to that value and look for a value for the attribute A. If one is found, report it.
5. Otherwise, do until there is no value for the isa attribute or until an answer is found:
a. Get the value of the isa attribute and move to that node.
b. See if there is a value for the attribute A. If there is, report it.
This procedure is simplistic. It does not say what we should do if there is more than one value of the instance or isa attribute. But it does describe the basic mechanism of inheritance. We can apply this procedure to our example knowledge base to derive answers to the following queries:
team(Pee-Wee-Reese) = Brooklyn-Dodgers. This attribute had a value stored explicitly in the knowledge base.
batting-average(Three-Finger-Brown) = .106. Since there is no value for batting average stored explicitly for Three Finger Brown, we follow the instance attribute to Pitcher and extract the value stored there. Now we observe one of the critical characteristics of property inheritance, namely that it may produce default values that are not guaranteed to be correct but that represent "best guesses" in the face of a lack of more precise information. In fact, in 1906, Brown's batting average was .204.
height(Pee-Wee-Reese) = 6-1. This represents another default inference. Notice here that because we get to it first, the more specific fact about the height of baseball players overrides a more general fact about the height of adult males.
bats(Three-Finger-Brown) = Right. To get a value for the attribute bats required going up the isa hierarchy to the class Baseball-Player. But what we found there was not a value but a rule for computing a value. This rule required another value (that for handed) as input. So the entire process must be begun again recursively to find a value for handed. This time, it is necessary to go all the way up to Person to discover that the default value for handedness for people is Right. Now the rule for bats can be applied, producing the result Right. In this case, that turns out to be wrong, since Brown is a switch hitter (i.e., he can hit both left- and right-handed).
Inferential Knowledge
Figure 3.7 shows two examples of the use of first-order predicate logic to represent additional knowledge about baseball.
Figure 3.7: Inferential Knowledge
Of course, this knowledge is useless unless there is also an inference procedure that can exploit it The required inference procedure now is one that implements the standard logical rules of inference. There are many such procedures, some of which reason forward from given facts to conclusions, others of which reason backward from desired conclusions to given facts. One of the most commonly used of these procedures is resolution, which exploits a proof by contradiction strategy.
In general, in fact, all of the techniques we are describing here should not be regarded as complete and incompatible, ways of representing knowledge. Instead, they should be viewed as building blocks of a complete representational system.
Procedural Knowledge
Another, equally useful, kind of knowledge is operational, or procedural knowledge, that specifies what to do when. Procedural knowledge can be represented in programs in many ways. The most common way is simply as code (in some programming language such as LISP) for doing something. The machine uses the knowledge when it executes the code to perform a task. Unfortunately, this way of representing procedural knowledge gets low scores with respect to the properties of inferential adequacy (because it is very difficult to write a program that can reason about another program's behaviour) and acquisitional efficiency (because the process of updating and debugging large pieces of code becomes unwieldy).
Figure 3.8: Using LISP Code to Define a Value
As an extreme example, compare the representation of the way to compute the value, of bats shown in Figure 3.6 to one in LISP shown in Figure 3.8. Although the LISP one will work given a particular way of storing attributes and values in a list, it does not lend itself to being reasoned about in the same straightforward way as the representation of Figure 3.6 does. The LISP representation is slightly more powerful since it makes explicit use of the name of the node whose value for handed is to be found. But if this matters, the simpler representation can be augmented to do this as well.
Because of this difficulty in reasoning with LISP, attempts have been made to find other ways of representing procedural knowledge so that it can relatively easily be manipulated both by other programs and by people.
The most commonly used technique for representing procedural knowledge in AI programs is the use of production rules. Figure 3.9 shows an example of a production rule that represents a piece of operational knowledge typically possessed by a baseball player.
Production rules, particularly ones that are augmented with information on how they are to be used, are more procedural than are the other representation methods discussed in this chapter. But making a clean distinction between declarative and procedural knowledge is difficult. Although at an intuitive level such a distinction makes some sense, at a formal level it disappears. In fact, as you can see the structure of the declarative knowledge of Figure 3. 7 is not substantially different from that of the operational knowledge of Figure 3.9. The important difference is in how the knowledge is used by the procedures that manipulate it.
Figure 3.9: Procedural Knowledge as Rules
Top
Issues in Knowledge Representation
Before embarking on a discussion of specific mechanisms that have been used to represent various kinds of real-world knowledge, we need briefly to discuss several issues that cut across all of them:
Are any attributes of objects so basic that they occur in almost every problem domain? If there are, we need to make sure that they are handled appropriately in each of the mechanisms we propose. If such attributes exist, what are they?
Are there any important relationships that exist among attributes of objects?
At what level should knowledge be represented? Is there a good set of primitives into which all knowledge can be broken down? Is it helpful to use such primitives?
How should sets of objects be represented?
Given a large amount of knowledge stored in a database, how can relevant parts be accessed when they are needed?
We will talk about each of these questions briefly in the next five sections.
Important Attributes
Instance and isa are two attributes that are of very general significance. These attributes are important because they support property inheritance. They are called a variety of things in AI systems, but the names do not matter. What does matter is that they represent class membership and class inclusion and that class inclusion is transitive. In slot-and-filler systems, these attributes are usually represented explicitly in a way much like that shown in Figures 3.5 and 3.6. In logic-based systems, these relationships may be represented this way or they may be represented implicitly by a set of predicates describing particular classes.
Relationships among Attributes
The attributes that we use to describe objects are themselves entities that we represent. What properties do they have independent of the specific knowledge they encode? There are four such properties that deserve mention here:
Inverses
Existence in an isa hierarchy
Techniques for reasoning about values
Single-valued attributes
Inverses
Entities in the world are related to each other in many different ways. But as soon as we decide to describe those relationships as attributes, we commit to a perspective in which we focus on one object and look for binary relationships between it and others. Attributes are those relationships. So, for example, in Figure 3.5, we used the attributes instance, isa, and team. Each of these was shown in the figure with a directed arrow, originating at the object that was being described and terminating at the object representing the value of the specified attribute. But we could equally well have focused on the object representing the value. If we do that, then there is still a relationship between the two entities, although it is a different one since the original relationship was not symmetric (although some relationships, such as sibling are). In many cases, it is important to represent this other view of relationships. There are two good ways to do this.
The first is to represent both relationships in a single representation that ignores focus. Logical representations are usually interpreted as doing this. For example, the assertion:
team(Pee-Wee-Reese, Brooklyn-Dodgers)
can equally easily be interpreted as a statement about Pee Wee Reese or about the Brooklyn Dodgers. How it is actually used depends on the other assertions that a system contains.
The second approach is to use attributes that focus on a single entity but to use them in pairs, one the inverse of the other. In this approach, we would represent the team information with two attributes:
one associated with Pee Wee Reese
team = Brooklyn-Dodgers
one associated with Brooklyn Dodgers
team-members = Pee-Wee-Reese,…….
This is the approach that is taken in semantic net and frame based systems. When it is used, it is usually accompanied by a knowledge acquisition tool that guarantees the consistency of inverse slots by forcing them to be declared, and then checking each time a value is added to one attribute that the corresponding value is added to the inverse.
An Isa Hierarchy of Attributes
Further attributes and specialization of attributes are there just as there are classes of objects and specialized subsets of those classes. Consider, for example, the attribute height. It is actually a specialization of the more general attribute physical-size which is, in, turn, a specialization of physical-attribute. These generalization-specialization relationships are important for attributes for the same reason that they are important for other concepts i.e. they support inheritance. In the case of attributes, they support inheriting information about such things as constraints on the values that the attribute can have and mechanisms for computing those values.
Techniques for Reasoning about Values
Sometimes values of attributes are specified explicitly when a knowledge base is created. We saw several examples of that in the baseball example of Figure 3.5. But often the reasoning system must reason about values it has not been given explicitly. Several kinds of information can play a role in this reasoning, including:
Information about the type of the value. For example, the value of height must be a number measured in a unit of length.
Constraints on the value often stated in terms of related entities. For example, the age of a person cannot be greater than the age of either of that person's parents.
Rules for computing the value when it is needed. We showed an example of such a rule in Figure 3.5 for the bats attribute. These rules are called backward rules. Such rules have also been called if-needed rules.
Rules that describe actions that should be taken, if a value ever becomes known. These rules are called forward rules, or sometimes if-added rules.
Single-Valued Attributes
A unique value is taken up by specific but very useful kind of attribute. For example, a baseball player can, at anyone time, have only a single height and be a member of only one team. If there is already a value present for one of these attributes and a different value is asserted, then one of two things has happened. Either a change has occurred in the world or there is now a contradiction in the knowledge base that needs to be resolved. Knowledge-representation systems have taken several different approaches to provide support for single-valued attributes, including:
Introduce an explicit notation for temporal interval. If two different values are ever asserted for the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is asserted, replace the old value.
Provide no explicit support. Logic-based systems are in this category. But in these systems, knowledge-base builders can add axioms that state that if an attribute has one value then it is known not to have all other values.
Choosing the Granularity of Representation
It is necessary to answer the question “At what level of detail should the world be represented?”, irrespective of the particular formalism. Should there be a small number of low-level ones or should there be a larger number covering a range of granularities? A brief example illustrates the problem. Suppose we are interested in the following fact:
John spotted Sue.
We could represent this as
spotted(agent(John) ,
object(Sue))
Such a representation would make it easy to answer questions such as:
Who spotted Sue?
But now suppose we want to know:
John see Sue?
The obvious answer is "yes," but given only the one fact we have, we cannot discover that answer. We could, of course, add other facts, such as
spotted(x,y) à saw(x,y) ,
We could then infer the answer to the question.
An alternative solution to this problem is to represent the fact that spotting is really a special type of seeing explicitly in the representation of the fact. We might write: something such as
saw(agent(John),
object(Sue) ,
timespan(briefty))
Here we have broken the idea of spotting apart into more primitive concepts of seeing and timespan. Using this representation, the fact that John saw Sue is immediately accessible. But the fact that he spotted her is more difficult to get to. The major advantage of converting all statements into a representation in terms of a small set of primitives is that the rules that are used to derive inferences from that knowledge need be written only in terms of the primitives rather than in terms of the many ways in which the knowledge may originally have appeared. Thus what is really being argued for is simply some sort of canonical form.
One of the several arguments which go against the use of low-level primitives is that simple high-level facts may require a lot of storage when broken down into primitives. Much of that storage is really wasted since the low-level rendition of a particular high-level concept will appear many times, once for each time the high-level concept is referenced.
Most of these limitations can be overcome by using another strategy for finding the structure and the meaning of a sentence in one step, called Conceptual Parsing. Conceptual parsing, like semantic grammars, is a strategy for finding both the structure and the meaning of a sentence in one step. Conceptual parsing is driven by a dictionary that describes the meanings of words as conceptual dependency (CD) structures. The first step in mapping a sentence into its CD representation involves a syntactic processor that extracts the main noun and verb. It also determines the syntactic category and aspectual class of the verb (i.e., stative, transitive, or intransitive). The conceptual processor then takes over. It makes use of verb-ACT dictionary, which contains an entry for each environment in which a verb can appear. For example, suppose that actions are being represented as combinations of a small set of primitive actions. Then the fact that John punched Mary might be represented as shown in Figure 3.10(a). The representation says that there was physical contact between John's fist and Mary. The contact was caused by John propelling his fist toward Mary, and in order to do that John first went to where Mary was? But suppose we also know that Mary punched John. Then we must also store the structure shown in Figure 3.10(b). If, however, punching were represented simply as punching, then most of the detail of both structures could be omitted from the structures themselves. It could instead be stored just once in a common representation of the concept of punching.
A second but related problem is that substantial work must be done to reduce the knowledge into primitive form. if knowledge is initially presented to the system in a relatively high-level form, such as English. Both in understanding language and in interpreting the world that we see, many things appear that later turn out to be irrelevant. For the sake of efficiency, it may be desirable to store these things at a very high level and then to analyze in detail only those inputs that appear to be important.
A third problem with the use of low-level primitives is that in many domains, it is not at all clear what the primitives should be. And even in domains in which there may be an obvious set of primitives, there may not be enough information present in each use of the high-level constructs to enable them to be converted into their primitive components. When this is true, there is no way to avoid representing facts at a variety of granularities.
There exists at least one obvious set of primitives: mother, father, son, daughter, and possibly brother and sister. But now suppose we are told that Mary is Sue's cousin. An attempt to describe the cousin relationship in terms of the primitives could produce any of the following interpretations:
Mary = daughter(brother(mother(Sue)))
Mary = daughter(sister(mother(Sue)))
Mary = daughter(brother(father(Sue)))
Mary = daughter(sister(father(Sue)))
Figure 3.10: Redundant representations
As illustrated, the problem of choosing the correct granularity of representation for a particular body of knowledge is not easy. Clearly, the lower the level we choose, the less inference required to reason with it in some cases, but the more inference required to create the representation from English and the more room it takes to store, since many inferences will be represented many times. The answer for any particular task domain must come to a large extent from the domain itself-to what use is the knowledge to be put?
One way of looking at the question of whether there exists a good set of low-level primitives is that it is a question of the existence of a unique representation. Does there exist a single, canonical way in which large bodies of knowledge can be represented independently of how they were originally stated? Another, closely related, uniqueness question asks whether individual objects can be represented uniquely and independently of how they are described.
The phrase Evening Star names a certain large physical object of spherical form, which is hurtling through space some scores of millions of miles from here. The phrase Morning Star names the same thing, as was probably first established by some observant Babylonian. But the two phrases cannot be regarded as having the same meaning, otherwise that Babytonian could have dispensed with his observations and contented himself with reflecting on the meaning of his words. The meanings, then, being different from one another, must be other than the named object, which is one and the same in both cases. For a program to be able to reason as did the Babylonian, it must be able to handle several distinct representations that turn out to stand for the same object.
Representing Sets of Objects
There are several reasons to satisfy why we must represent set of objects. One is that there are some properties that are true of sets that are not true of the individual members of a set. As examples, consider the assertions that are being made in the sentences "There are more sheep than people in Australia" and "English speakers can be found all over the world." The only way to represent the facts described in these sentences is to attach assertions to the sets representing people, sheep, and English speakers, since, for example, no single English speaker can be found all over the world.
Secondly if a property is true of all (or even most) elements of a set, then it is more efficient to associate it once with the set rather than to associate it explicitly with every element of the set. We have already looked at ways of doing that, both in logical representations through the use of the universal quantifier and in slot-and-filler structures, where we used nodes to represent sets and inheritance to propagate set-level assertions down to individuals. As we consider ways to represent sets, we will want to consider both of these uses of set-level representations. We will also need to remember that the two uses must be kept distinct. Thus if we assert something like large(Elephant), it must be clear whether we are asserting some property of the set itself or some property that holds for individual elements of the set.
The simplest way in which a set may be represented is just by a name. This simple representation does make it possible to associate predicates with sets. But it does not, by itself, provide any information about the set it represents. It does not, for example, tell how to determine whether a particular object is a member of the set or not.
There are two ways to state a definition of a set and its elements. The first is to list the members. Such a specification is called an extensional definition. The second is to provide a rule that, when a particular object is evaluated, return true or false depending on whether the object is in the set or not. Such a rule is called an intentional definition. For example, an extensional description of the set of our sun's planets on which people live is {Earth}. An intentional description is
{x: sun-planet(x) Ù human-inhabited(x)}
For simple sets, it may not matter, except possibly with respect to efficiency concerns, which representation is used. But the two kinds of representations can function differently in some cases.
One way in which extensional and intentional representations differ is that they do, not necessarily correspond one-to-one with each other. For example, the extensionally defined set {Earth} has many intentional definitions in addition to the one we just gave. Others include:
{x : sun-planet(x) Ù nth-farthest-from-sun(x, 3)}
{x: sun-planet(x) Ù nth-biggest(x, 5}}
Thus, while it is trivial to determine whether two sets are identical if extensional descriptions are used, it may be very difficult to do so using intentional descriptions. Intentional representations have two important properties that extensional ones lack, however. The first is that they can be used to describe infinite sets and sets not all of whose elements are explicitly known. Thus we can describe intentionally such sets as prime numbers (of which there are infinitely many) or kings of England (even though we do not know who all of them are or even how many of them there have been). The second thing we can do with intentional descriptions is to allow them to depend on parameters that can change, such as time or spatial location. If we do that, then the actual set that is represented by the description will change as a function of the value of those parameters. To see the effect of this, consider the sentence, "The president of the United States used to be a Democrat," uttered when the current president is a Republican. This sentence can mean two things. The first is that the specific person who is now president was once a Democrat. This meaning can be captured straightforwardly with an extensional representation of "the president of the United States." We just specify the individual. But there is a second meaning, namely that there was once someone who was the president and who was a Democrat. To represent the meaning of "the president of the United States" given this interpretation requires an intentional description that depends on time. Thus we might write president(t), where president is some function that maps instances of time onto instances of people, namely U.S. presidents.
Finding the Right Structures as Needed
Suppose we have a script (a description of a class of events in terms of contexts, participants, and subevents) that describes the typical sequence of events in a restaurant. This script would enable us to take a text such as
John went to Steak and Ale last night. He ordered a large rare steak, paid his bill and left and answer "yes" to the question:
Did John eat dinner last night?
One thing we must notice that nowhere in the story it was mentioned explicitly that John's eating anything . But the fact that when one goes to a restaurant, one eats, will be contained in the restaurant script. If we know in advance to use the restaurant script then we can answer the question easily. But in order to be able to reason about a variety of things, a system must have many scripts for everything from going to work to sailing around the world. How will it select the appropriate one each time? For example, nowhere in our story was the word "restaurant” mentioned.
In fact, in order to have access to the right structure for describing a particular situation, it is necessary to solve all of the following problems.
How to perform an initial selection of the most appropriate structure.
How to fill in appropriate details from the current situation.
How to find a better structure if the one chosen initially turns out not to be appropriate? What to do if none of the available structures is appropriate?
When to create and remember a new structure.
There is no good general purpose method for solving all these problems. Some knowledge-representation techniques solve some of them. In this section we survey some solutions to two of these problems: how to select an initial structure to consider and how to find a better structure if that one turns out not to be a good match.
Selecting an Initial Structure
Selecting candidate knowledge structures to match a particular problem-solving situation is a hard problem; there are several ways in which it can be done. Three important approaches are the following:
1. Index the structures directly by the significant English words that can be used to describe them. For example, let each verb have associated with it a structure that describes its meaning. This is the approach taken in conceptual dependency theory. Even for selecting simple structures, such as those representing the meanings of individual words, though, this approach may not be adequate, since many words may have several distinct meanings. For example the word "fly" has a different meaning in each of the following sentences:
- John flew to New York. (He rode in a plane from one place to another.)
- John flew a kite. (He held a kite that was up in the air.)
- John flew down the street. (He moved very rapidly.)
- John flew into a rage. (An idiom)
Another problem with this approach is that it is only useful when there is an English description of the problem to be solved.
2. Consider each major concept as a pointer to all of the structures (such as scripts) in which it might be involved. This may produce several sets of prospective structures. For example, the concept Steak might point to two scripts, one for restaurant and one for supermarket. The concept Bill might point to a restaurant and a shopping script. Take the intersection of those sets to get the structure(s), preferably precisely one, that involves all the content words. Given the pointers just described and the story about John's trip to Steak and Ale, the restaurant script would be evoked. One important problem with this method is that if the problem description contains any even slightly extraneous concepts, then the intersection of their associated structures will be empty. This might occur if we had said, for example, "John rode his bicycle to Steak and Ale last night." Another problem is that it may require a great deal of computation to compute all of the possibility sets and then to intersect them. However, if computing such sets and intersecting them could be done in parallel, then the time required to produce an answer would be reasonable even if the total number of computations is large.
3. Locate one major clue in the problem description and use it to select an initial structure. As other clues appear, use them to refine the initial selection or to make a completely new one if necessary. The major problem with this method is that in some situations there is not an easily identifiable major clue. A second problem is that it is necessary to anticipate which clues are going to be important and which are not. But the relative importance of clues can change dramatically from one situation to another. For example, in many contexts, the colour of the objects involved is not important. But if we are told “The light turned red," then the colour of the light is the most important feature to consider.
Unfortunately none of these proposals seems to be the complete answer to the problem. If at all we get one of the more complex the knowledge structures, the harder it is, to tell when a particular one is appropriate.
Revising the Choice When Necessary
Depending on the representation we are using, the details of the matching process will vary. It may require variables to be bound to objects. It may require attributes to have their values compared. In any case, if values that satisfy the required restrictions as imposed by the knowledge structure can be found, they are put into the appropriate places in the structure. If no appropriate values can be found, then a new structure must be selected. The way in which the attempt to instantiate this first structure failed may provide useful cues as to which one to try next. If, on the other hand, appropriate values can be found, then the current structure can be taken to be appropriate for describing the current situation. But, of course, that situation may change. Then information about what happened (for example, we walked around the room we were looking at) may be useful in selecting a new structure to describe the revised situation.
When the process runs into a snag, though, it is often not necessary to abandon the effort and start over. Rather, there are a variety of things that can be done:
Select the fragments of the current structure that do correspond to the situation and match them against candidate alternatives. Choose the best match. If the current structure was at all close to being appropriate, much of the work that has been done to build substructures to fit into it will be preserved.
Make an excuse for the current structure's failure and continue to use it. For example, a proposed chair with only three legs might simply be broken. Or there might be another object in front of it which occludes one leg part of the structure should contain information about the features for which it is acceptable to make excuses. Also, there are general heuristics, such as the fact that a structure is more likely to be appropriate if a desired feature is missing (perhaps because it is hidden from view) than if an inappropriate feature is present. For example, a person with one leg is more plausible than a person with a tail.
Refer to specific stored links between structures to suggest new directions in which to explore. An example of this sort of linking among a set of frames is shown in the similarity network shown in Figure 3.11.
If the knowledge structures are stored in an isa hierarchy, traverse upward in it until a structure is found that is sufficiently general that it does not conflict with the evidence. Either use this structure if it is specific enough to provide the required knowledge or consider creating a new structure just below the matching one.
Student Activity 3.2
Before reading next section, answer the following questions.
1. Write short notes on inferential knowledge and procedure knowledge.
2. What are the main issues in knowledge representation?
3. What is the use of “instance” and “isa” attributes?
If your answers are correct, then proceed to the next section.
Top
The Frame Problem
We have seen several methods for representing knowledge that would allow us to form complex state descriptions for a search program. Now next concern is how to represent efficiently sequences of problem states that arise from a search process. For complex ill-structured problems, this can be a serious matter.
Figure 3.11: A Similarity Net
Consider the world of a household robot. There are many objects and relationships in the world, and a state description must somehow include facts like on (Plant 12, Table34), under(Table34, Window'13), and in(Table34, RoomI5). One strategy is to store each state description as a list of such facts. Most of the facts will not change from one state to another, yet each fact will be represented once at every node, and we will quickly run out of memory. Furthermore, we will spend the majority of our time creating these nodes and copying these facts-most of which do not change often-from one node to another. For example, in the robot world, we could spend a lot of time recording above(Ceiling, Floor) at every node. In addition to the real problem of figuring out which facts should be different at each node.
This whole problem of representing the facts that change as well as those that do not is known as the frame problem. In some domains, the only hard part is representing all the facts. In others, though, figuring out which ones change is nontrivial. For example, in the robot world, there might be a table with a plant, on it under the window. Suppose we move the table to the center of the room. We must also infer that the plant is now in the center of the room too but that the window is not.
To support this kind of reasoning, some systems make use of an explicit set of axioms called frame axioms, which describe all the things that do not change when a particular operator is applied in state n to produce state n+1. Thus in the robot domain, we might write axioms such as,
Color(x,y,s1) Ù move(x,S1,S2)à color(x,y,s2)
which can be read as, "If x has colour y in state s1 and the operation of moving x is applied in state s1 to produce state s2, then the colour of x in S2 is still y. Unfortunately, in any complex domain, a huge number of these axioms becomes necessary. An alternative approach is to make the assumption that the only things that change are the things that must. By "must" here we mean that the change is either required explicitly by the axioms that describe the operator or that it follows logically from some change that is asserted explicitly. This idea of circumscribing the set of unusual things is a very powerful one, it can be used as a partial solution to the frame problem and as a way of reasoning with incomplete knowledge.
Briefly returning back to the problem of representing a changing problem state. Simply starting with a description of the initial state can do that and then making changes to that description as indicated by the rules we apply. This solves the problem of the wasted space and time involved in copying the information for each node. And it works fine until the first time the search has to backtrack. Then, unless all the changes that were made can simply be ignored, we are faced with the problem of backing up to some earlier node. But how do we know what changes in the problem state description need to be undone? There are two ways this problem can be solved.
l Do not modify the initial state description at all. At each node, store an indication of the specific changes that should be made at this node. Whenever it is necessary to refer to the description of the current problem state, look at the initial state description and also look back through all the nodes on the path from the start state to the current state. This approach makes backtracking very easy, but it makes referring to the state description fairly complex.
l Modify the initial state description as appropriate, but also record at each node an indication of what to do to undo the move should it ever be necessary to backtrack through the node. Then, whenever it is necessary to backtrack, check each node along the way and perform the indicated operations on the state description.
Answer the following questions.
1. What do you understand by a frame? How is it useful in knowledge representation?
2. What are Frame Axioms, where is it used?
Summary
l Facts: truths in some relevant world. These are the things we want to represent.
l The forward representation mapping maps from facts to representations. The backward representation mapping goes the other way, from representations to facts.
l Representations of facts is some chosen formalism.
l The relational knowledge corresponds to a set of attributes and associated values that together describe the objects of the knowledge base.
l It is possible to augment the basic representation with inference mechanisms that operate on the structure of the representation. For this to be effective the structure must be designed to correspond to the inference mechanisms that are desired.
l One of the most useful forms of inference is property inheritance, in which element of specific classes inherit attributes and values from more general classes in which they are included.
l In order to support property inheritance, objects must be organized into classes and classes must be arranged in a generalization hierarchy.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment