Language Parsing
Overview
Transforming a written query to a structure that encodes its meaning is a search problem. In any human language, there is ambiguity involved at various levels of interpretation, and so arriving at a final interpretation of a query involves resolving that ambiguity by considering (searching) various possible interpretations. More specifically we need to do the following:
1. | Map words to entities. For example, the word "Daniel" may refer to the name "Daniel", or it may refer to me, Daniel Bigham (me), or it may refer to Daniel of the Bible. Since a query consists of not a single word but several words, there may be several, perhaps even hundreds of different word-to-entity mappings. |
2. | Transform linguistic fragments. The next step is to evaluate the validity of each of the possible word-to-entity mappings that result from the first step. To do this for a particular word-to-entity mapping, we evaluate whether it is possible to transform the list of entities to our internal knowledge representation. A transformation is defined by a pattern that, when found in the list of entities, might imply a certain representation. For example, when we see two nouns side by side, such as "my name", the second noun may refer to a property of the first. ie. DanielBigham.first_name. However, in the case of the two nouns "Daniel Bigham", the noun "Bigham", which is a last name, is not a property of the noun "Daniel", which is a first name. To reiterate: For each possible word-to-entity mapping from in step 1 above, we need to perform a second search, a transformation search, that considers what sequence of transformations we can apply to convert the written query to our structured knowledge representation. In most cases, only one of the word-to-entity mappings will have such a sequence of transformations, leaving us with a single interpretation of the query. |
An Example
Let's consider what needs to be done to transform the statement
"Daniel Bigham's age is 28" to our internal representation, which for this statement might be something like
DanielBigham.age = 28.
According to step #1 above, we need to consider what entities each of the words can map to. Note: We will treat the apostrophe-S as its own word. The words are, then:
Daniel, Bigham, 's, age, is, 28Before we proceed any further, we should clarify that the system includes an entity to represent each word itself apart from the other various entities that the word might refer to. For example, word
"Daniel" is an entity itself (which happens to be a first name), in addition to the entities
DanielBigham (myself) and
DanielOfTheBible:
  | "Daniel" (the word) |
  | DanielBigham (myself) |
  | DanielOfTheBible |
In this example, we'll say that Daniel refers to the word "Daniel" itself. Likewise, each of the other words in this statement refer to the entities that represent those actual words, except for age, which maps not to the word "age", but to an entity which means how old something is, and 28, which refers not to the word "28", but to an entity which represents the number 28. In review, the word-to-entity mapping we have is:
  | "Daniel" (the word) |
  | "Bigham" (the word) |
  | "'s" (the word) |
  | age (how old something is) |
  | "is" (the word) |
  | 28 (the number) |
Written out:
| "Daniel" "Bigham" "'s" age "is" 28 |
|
The next step is to attempt to transform this sequence of entities to an internal representation.
The first transformation we apply is as follows:
| "Daniel" "Bigham" -> DanielBigham |
|
What this says is that if we have the word "Daniel" followed by the word "Bigham", that might refer to the entity
DanielBigham. After applying this transformation we are left with:
| DanielBigham "'s" age "is" 28 |
|
The next transformation we apply is as follows:
| {noun} "'s" {noun} -> $1.$2 |
|
What this says, as noted above, is that if we have a noun followed by apostrophe-S, followed by another noun, the second noun may refer to a property of the first noun. When the system applies this transformation, it must ensure that the first entity, in this case
DanielBigham, does in fact have a property
age. After applying this transformation, we are left with:
The final transformation we apply is as follows:
| {noun} "is" {noun} -> $1 = $2 |
|
What this says is that, if we have a noun, followed by the word "is", followed by another noun, the second noun might be the value of the first noun. In other words,
28 might be the value of
DanielBigham.age. When the system applies this transformation, it must ensure that
28 is a valid value for
DanielBigham.age. It does this by checking that
28, which is a
number, is also an
age, which it is.
We are left with:
This is fully-transformed. ie. It is a statement compatible with our system's internal knowledge representation, so we can tell the system to execute this statement and it will remember the fact that the value of the
DanielBigham entity's
age property is
28.
The is_a Relationship
In the section above, we applied the following transformation:
Input fragment:
| DanielBigham "'s" age "is" 28 |
|
Transformation:
| {noun} "'s" {noun} -> $1.$2 |
|
Output fragment:
To do this, however, we needed to know that the entity
DanielBigham could be considered a
noun, and likewise for the entity
age. How did we know that?
The system accomplishes this by the convention that the
is_a relationship defines this fact. In other words:
DanielBigham is_a noun age is_a noun |
|
In fact, the system allows a chained relationship to define this fact as well. In other words:
DanielBigham is_a person person is_a noun |
|
Granted, this may seem strange in that a person isn't really a
noun per se.
The has_a Relationship
How did the system know that the
DanielBigham entity had an
age property? Via the
has_a relationship:
The system also allows indirect
has_a relationships via the
is_a relationship:
DanielBigham is_a person person has_a age |
|
The Algorithm
A simplified representation of the transformation search algorithm:
Given a transformation, t, let t[i] be the ith token of that transformation's input specification Let a list of entities be known as a fragment |
|
Let w be the list of entities which represent the words of the sentence Push w, a fragment, onto a stack F
While F is not empty Pop f off of the stack F Let e be the list of entities such that f[i] has either a direct or indirect is_a relationship with e[i] For j = 0 to length(f) - 1 Let t be the list of transformations such that for all x e[j] has a direct or indirect is_a relationship with t[x][0] and e[j+1] has a direct or indirect is_a relationship with t[x][1] For each transformation t[i] such that t[i] is a proper match Apply transformation t[i] to fragment f, resulting in fragment f' If f' is fully transformed Add f' to the output Else Push fragment f' onto the stack F
|
|