Artificial Intelligence and Natural Language Processing

[LIS 521: Knowledge Representation]

Presented by Ken Thompson

May 21, 1998

A Debriefing

 

Although my putative topic was Natural Language Processing (NLP), and originally I had intended to look at the different ways in which humans and machines process language, in the end my presentation focussed on trying to establish how the knowledge representation system developed by Minsky and others had created somewhat of a dead end for Artificial Intelligence (AI) in general, and NLP specifically. My presentation attempted to draw together some of the other topics presented (Predicate Logic, Semantic Networks) and look at how some of the underlying assumptions in these fields are really attempts to create a kind of machine-based language system, and examined the field of Neural Networks as an alternate organizational model.

The first carefully controlled experiment we performed was called "Darlene vs. the Computer." The computer was easily able to add various numbers together, multiply by another, then divide by another integer. Darlene was not. On the other hand, Darlene was able to easily catch the wad of paper I threw to her, as well as throw it right back to me, while the computer was not. I speculated that even if the computer had had vision and a mechanical arm, it would not be able to calculate the trajectory, weight, angle, etc. of the wad of paper as quickly as Darlene did. I speculated that this might because people utilize neural networks (which are inherently "fuzzy" and which there is a lot of "best guessing" that goes on), whereas computers use exact brute computational force.

We then looked at the Minsky reading from Society of Mind, and compared some of Minsky’s precepts to the basic precepts of how NLP views language. The following chart was utilized:

NLP Minsky
Pragmatic (world knowledge)

Discourse ( > sentences)

Semantic (sentence level)

Syntactic (building words into sentences)

Lexical (word level)

Morphological (word parts)

Phonological (speech sounds)

A and B Brains (monitoring systems)

Societies (layers of K-lines)

K-lines (memories)

Desires/Goals (difference engines)

Agencies (eg. Builder)

Agents (eg. Add)

Object (eg. blocks)

Without going into the specifics here of what the each of these 14 items means, let it suffice to say that the important element was that both of these systems have very small/basic elements at the bottom, and that the levels are additive – that is to say, each level is comprised of multiple instances of the level below it. The model is this: complex systems can be built out of very simple building blocks (this is the basic idea behind evolution). So NLP advocates believe a computer can ‘understand’ language if only they can start with word parts, build those into words, establish the rules for how words relate to each other in phrases, establish grammatical and syntactic rules to build words into sentences, understand how sentences relate to each other, and somehow incorporate all the implied information in a natural language. Likewise, Minsky’s Society is built up from very small units, all interconnected and functioning together, but in essence, no individual part is smarter than On/Off.

We looked at some examples of AI to see how this schema had worked out. First we looked at the game Quake2 (download your own copy from www.quake.com). We examined the world of Quake2 as one constructed up from the bottom – small programs building on each other to create an entire world. It goes something like this: every object in Quake2 has a set of attributes: walls can not be walked through; windows can be broken; bridges can be jumped off of; bullet A does x damage; bullet B does Y damage; grenade launchers can fire x rounds/minute; machine guns can fire y rounds/minute. At the next level up, MonsterA uses machine guns only and can run x feet/second; while SoldierA uses lasers only and runs at y feet/second. Further, MonsterA will chase you out of a room, while SoldierA will always stop. Thus from a set of qualities of the smallest items at the bottom, as they become used by entities at a higher level, a ‘full fleshed’ world begins to be realized. However, as one plays the game, some ‘unnatural’ restrictions become evident: everyone fights to the death; the world is split into 4 levels, and no one but the player can move between levels; etc. Moreover, it is not a game that learns about you as you play it. If it had an adaptive intelligence, it would learn about you and the tricks that you use. It is exactly this type of pattern recognition (i.e. learning) that neural network systems do well. But more on that later.

We also interacted with some online AI programs, to see how naturally they used language and responded to our queries. We visited Eliza (www-ai.ijs.si/eliza-cgi-bin/eliza_script), that venerable AI therapist. Eliza asks you questions in response to yours. It was easy to see the failings of Eliza: she had many pat responses that would be triggered when no particular response seemed called for – "tell me more," or "what makes you say that?" Other words in the input were obviously used to trigger certain responses – entering the word "mother" anywhere invariably drew the response "tell me more about your family." We also visited the MilkMystic (www.whymilk.com). This program was a specialist. Unlike Eliza and some others, who could be quizzed on any topic, the MilkMystic took only questions about milk. It was able to answer some questions fairly well. Questions about vitamins drew a list of nutrients. It seemed to know that milk came from cows. Further questions about cows ("where do they live?") seemed to confuse it. Randomly, I asked it who was the President. It correctly responded "Bill Clinton." We speculated that enough people had asked this off-topic question that the programmer had decided to include a response. Overall, these AI programs were not convincing, and showed the bias of the ‘building-block’ approach to AI where one has to account for ALL the variables at the bottom level in order to simulate intelligence.

We also conducted a double-blind test wherein the class read two very short stories, one of which was written by a person, one written by a computer (Brutus.1). They were both on the subject: Betrayal. (they are attached at the end of this report). 6 members of the class correctly identified the computer written story. 3 members of the class thought the reverse was true. People used a variety of clues to try and identify the computer-written narrative. One incorrect guesser thought that the story with more details was the human written story. Someone else was tipped off by an unusual word in the human written story and thought that a machine would probably not have that vocabulary, or would have used a simpler word. Others pointed to the trite details of the computer-generated story as a giveaway. Another correct guesser pointed to the very linear and logical arrangement of the computer’s tale.

Brutus.1 was programmed to ‘know’ a number of things: about English grammar, given a vocabulary, about short story plot fileboy, various literary devices, various human behaviors, and given an in depth knowledge of the university environment and structure (the setting for the story). The programmers then devised a mathematical formula for the concept of ‘betrayal’ and fed it into the computer. It looks like this:

An agent B betrays agent A if and only if there exists some state of affairs p such that 1) A wants p to occur; 2) B believes that A wants p to occur; 3) EITHER a.)B agrees with A that p ought to occur and A wants some action a that B performs in the belief that p will thereby occur OR b.) A wants no action a that B performs in the belief that p will thereby not occur; 4) there is some action a such that a) B performs a in the belief that p will thereby not occur; AND b) there exists no state of affairs q such that q is believed by B to be good for A and B performs a in the belief that q will not occur; AND FINALLY 5) B believes that there is some action a that B will perform in the belief that p will thereby occur. (Bringsjord p. 26)

All in all, however, despite the simplicity and bad prose created by the computer, we were amazed at Brutus.1’s command of the English language, and its (his?) ability to accurately compose on the given theme.

We then went on to talk about neural networks. These are computer programs modeled on human biological systems – the brain’s system of neurons, dendrites, and axons. Neural nets are in use in a variety of capacities. Among these are: mortgage risk assessment, bomb sniffing in airports; speech recognition; airline marketing tactics; voice transcription systems, handwriting verification systems. What do all these have in common? For starters, they all deal with the recognition of patterns. The airline marketing program looks for pattern of seat purchases and decides how many special fares are needed for future flights. It constantly monitors all purchasing decisions, and makes decisions and revisions as time passes (read: as it learns new patterns of purchasing). The other element in common to most neural nets is that they are dealing with signal inputs (like sensory inputs). Signal inputs are easily translated into patterns. Neural nets are said to have these qualities: the react, they behave, they self-organize, they learn, they forget.

They are able to do these things thanks to their design. Signals come in through the dendrites, are held in the cell body, then go out through the axon. The cell receives multiple inputs via the dendrites, stores the input, then decides whether to send a single output via the axon. The various inputs are not all treated equally (some input is worth more than others), and if the incoming signals do not reach the locally prescribed threshold level, no outgoing signal is sent. Some outgoing signal may in fact loop back to the same cell, as neurons (10 billion in all) are connected in a web-like structure (network!). This set up seems to make the human brain very good at recognizing patterns, and filling in the gaps in patterns (fuzzy logic).

In an attempt to wrap things up, we then briefly spoke about the reasons that the computational approach has occupied the spotlight in AI/NLP research for the past 30 years. Minsky, in a seminal 1962 article, argued vehemently against the neural network model. His position within the community was such that he held sway, and not coincidentally his own research rose to the top. Since the early 90’s neural nets have received renewed attention as researchers realize the limitations of the computational model of AI. However, the Minskian/computational methodology has been an incredibly success way of representing knowledge. It has permeated all forms of computing, and is likely to dominate it as a paradigm for some time.

 

 

Assigned Readings:

Cullingford, Richard E. 1986. Natural Language Processing: a knowledge engineering approach. Pp. 1-13

Grolier’s Encyclopedia [online version]. 1998. Entries for "artificial intelligence" and "neural networks."

Highfield, Roger. "Thousands are taken in by robotic patter: Megalab 98 has been another mega success." The Daily Telegraph, March 21, 1998. Pg. 22.

Kaufman, Herbert. 1994. The emergent kingdom: machines that think like people. The Futurist 28(1):20-?.

McTear, Michael. 1987. The Articulate Computer. Pp. 71-77.

Minsky, Marvin. 1985. The Society of Mind. Selected pages.

Schuytema, Paul C. 1996. "Simtelligence: tips for playing Raven’s Hexen and Activision’s MechWarrior 2." Computer Gaming World #138, p.336-?.

 

Also used to prepare presentation:

Nelson, Marilyn McCord. A practical guide to neural nets. Addison-Wesley Publishing, 1990.

Bringsjord, Selmer. 1998. "Chess is Too Easy." Technology Review March/April 1998. p.23-28.

Liddy, Elizabeth D. 1998. "Enhanced Text Retrieval Using Natural Language Processing." Bulletin of the American Society for Information Science April/May 1998. p.14-16.