How do I use nltk.containers.Trie?

Question

I want to use nltk.containers.Trie to perform simple operations like inserting a word into the trie, retrieving all words with a given prefix, find nodes with most descendants (i.e. most common prefixes), graphically viewing the trie and so on. I couldn't find any documentation whatsoever regarding the use of this structure. Here's all I have so far:

from nltk.containers import Trie
t = Trie()

I now have a list of words which I need to add to the trie.

alexis · Accepted Answer

It's pretty cryptic, isn't it. It's basically a dictionary but you can additionally check if a string is a prefix of a known key:

>>> t = Trie()
>>> t['they'] = 15
>>> 'the' in t
True
>>> print t['the']
None

There's also find_prefix, which will match as much of its argument as possible, and return the value it finds there (if any) plus the remainder of the argument:

>>> t.find_prefix("theirs")
(None, 'irs')                 # Prefix "the" has no value

You could take a look at the source in nltk/containers.py. The magic is in the methods __setitem__ and __getitem__, which handle expressions of the form t[key].

Also good to know: The keys() method will only return real entries, not prefixes. You can use it with the method subtrie to retrieve all words that begin with a given prefix:

>>> t.subtrie('th').keys()
['ey']

PS. Note that containers.py was removed from the NLTK about six months ago! Before you update your nltk distribution (which you should), save nltk/containers.py under a different name. Better yet, just save the Trie class. (The rest of the file is obsolete).

How do I use nltk.containers.Trie?

Answers (1)

Related Questions