Language Parsing

Overview

Transforming a written query to a structure that encodes its meaning is a search problem. In any human language, there is ambiguity involved at various levels of interpretation, and so arriving at a final interpretation of a query involves resolving that ambiguity by considering (searching) various possible interpretations. More specifically we need to do the following:

1.Map words to entities. For example, the word "Daniel" may refer to the name "Daniel", or it may refer to me, Daniel Bigham (me), or it may refer to Daniel of the Bible. Since a query consists of not a single word but several words, there may be several, perhaps even hundreds of different word-to-entity mappings.

2.Transform linguistic fragments. The next step is to evaluate the validity of each of the possible word-to-entity mappings that result from the first step. To do this for a particular word-to-entity mapping, we evaluate whether it is possible to transform the list of entities to our internal knowledge representation. A transformation is defined by a pattern that, when found in the list of entities, might imply a certain representation. For example, when we see two nouns side by side, such as "my name", the second noun may refer to a property of the first. ie. DanielBigham.first_name. However, in the case of the two nouns "Daniel Bigham", the noun "Bigham", which is a last name, is not a property of the noun "Daniel", which is a first name. To reiterate: For each possible word-to-entity mapping from in step 1 above, we need to perform a second search, a transformation search, that considers what sequence of transformations we can apply to convert the written query to our structured knowledge representation. In most cases, only one of the word-to-entity mappings will have such a sequence of transformations, leaving us with a single interpretation of the query.

An Example

Let's consider what needs to be done to transform the statement "Daniel Bigham's age is 28" to our internal representation, which for this statement might be something like DanielBigham.age = 28.

According to step #1 above, we need to consider what entities each of the words can map to. Note: We will treat the apostrophe-S as its own word. The words are, then:

Daniel, Bigham, 's, age, is, 28

Before we proceed any further, we should clarify that the system includes an entity to represent each word itself apart from the other various entities that the word might refer to. For example, word "Daniel" is an entity itself (which happens to be a first name), in addition to the entities DanielBigham (myself) and DanielOfTheBible:

"Daniel" (the word)
DanielBigham (myself)
DanielOfTheBible

In this example, we'll say that Daniel refers to the word "Daniel" itself. Likewise, each of the other words in this statement refer to the entities that represent those actual words, except for age, which maps not to the word "age", but to an entity which means how old something is, and 28, which refers not to the word "28", but to an entity which represents the number 28. In review, the word-to-entity mapping we have is:

"Daniel" (the word)
"Bigham" (the word)
"'s" (the word)
age (how old something is)
"is" (the word)
28 (the number)

Written out:

"Daniel" "Bigham" "'s" age "is" 28

The next step is to attempt to transform this sequence of entities to an internal representation.

The first transformation we apply is as follows:

"Daniel" "Bigham" -> DanielBigham

What this says is that if we have the word "Daniel" followed by the word "Bigham", that might refer to the entity DanielBigham. After applying this transformation we are left with:

DanielBigham "'s" age "is" 28

The next transformation we apply is as follows:

{noun} "'s" {noun} -> $1.$2

What this says, as noted above, is that if we have a noun followed by apostrophe-S, followed by another noun, the second noun may refer to a property of the first noun. When the system applies this transformation, it must ensure that the first entity, in this case DanielBigham, does in fact have a property age. After applying this transformation, we are left with:

DanielBigham.age "is" 28

The final transformation we apply is as follows:

{noun} "is" {noun} -> $1 = $2

What this says is that, if we have a noun, followed by the word "is", followed by another noun, the second noun might be the value of the first noun. In other words, 28 might be the value of DanielBigham.age. When the system applies this transformation, it must ensure that 28 is a valid value for DanielBigham.age. It does this by checking that 28, which is a number, is also an age, which it is.

We are left with:

DanielBigham.age = 28

This is fully-transformed. ie. It is a statement compatible with our system's internal knowledge representation, so we can tell the system to execute this statement and it will remember the fact that the value of the DanielBigham entity's age property is 28.

The is_a Relationship

In the section above, we applied the following transformation:

Input fragment:
DanielBigham "'s" age "is" 28
Transformation:
{noun} "'s" {noun} -> $1.$2
Output fragment:
DanielBigham.age "is" 28

To do this, however, we needed to know that the entity DanielBigham could be considered a noun, and likewise for the entity age. How did we know that?

The system accomplishes this by the convention that the is_a relationship defines this fact. In other words:

DanielBigham is_a noun
age is_a noun

In fact, the system allows a chained relationship to define this fact as well. In other words:

DanielBigham is_a person
person is_a noun

Granted, this may seem strange in that a person isn't really a noun per se.

The has_a Relationship

How did the system know that the DanielBigham entity had an age property? Via the has_a relationship:

DanielBigham has_a age

The system also allows indirect has_a relationships via the is_a relationship:

DanielBigham is_a person
person has_a age

The Algorithm

A simplified representation of the transformation search algorithm:

Given a transformation, t, let t[i] be the ith token of that transformation's input specification
Let a list of entities be known as a fragment

Let w be the list of entities which represent the words of the sentence
Push w, a fragment, onto a stack F

While F is not empty
    Pop f off of the stack F
    Let e be the list of entities such that f[i] has either
    a direct or indirect is_a relationship with e[i]
    For j = 0 to length(f) - 1
        Let t be the list of transformations such that for all x
        e[j] has a direct or indirect is_a relationship with t[x][0] and
        e[j+1] has a direct or indirect is_a relationship with t[x][1]
        For each transformation t[i] such that t[i] is a proper match
            Apply transformation t[i] to fragment f, resulting in fragment f'
            If f' is fully transformed
                Add f' to the output
            Else
                Push fragment f' onto the stack F