Parsing Natural Language

Question

What's the most efficient way to parse natural language?

Let "strings" be a map containing strings such as:

Set the alarm for *.
Call *.
Get me an * at * for *.

and their corresponding functions. Now suppose "input" is a string containing a sentence like:

Call David.

How to implement a function such as parse which would take the "input" and match it to one of the strings in the map. Then call its corresponding function, passing it argc and argv containing all the wild card entires (* in strings). What's the most efficient way to implement such a function?

MSalters · Accepted Answer

Not sure why this question got a downvote. It's well-posed an non-trivial.

There are plenty of academic approaches to parsing, which are mostly needed for degenerate grammars. "natural language" is perhaps not a well-defined term, and natural languages do have some ambiguity, but such constrained subsets are not problematic.

In this specific example, we see that the different production rules (map entries) are not mutually ambiguous. In fact, the first token is sufficient for disambiguation. And since a std::map is sorted, we can do an efficient O(log N) search for that token.

Hence, we only need to derive the substitutions. Again, we'll ignore the degenerate cases. Nobody is going to bother with "Get me an at at at for at."`, even though it parses unambiguously.

Instead, for substitutions you simply collect tokens until you get the expected next token. Get me an * at * for *. means that the first * gets all tokens up to at, the second * collects tokens up to for, and the final * gets all remaining tokens.

You see that no backtracking is needed. If parsing fails, there simply is no match.

Parsing Natural Language

Answers (1)

Related Questions