Reputation: 428
I want to use nltk.containers.Trie to perform simple operations like inserting a word into the trie, retrieving all words with a given prefix, find nodes with most descendants (i.e. most common prefixes), graphically viewing the trie and so on. I couldn't find any documentation whatsoever regarding the use of this structure. Here's all I have so far:
from nltk.containers import Trie
t = Trie()
I now have a list of words which I need to add to the trie.
Upvotes: 2
Views: 1207
Reputation: 50220
It's pretty cryptic, isn't it. It's basically a dictionary but you can additionally check if a string is a prefix of a known key:
>>> t = Trie()
>>> t['they'] = 15
>>> 'the' in t
True
>>> print t['the']
None
There's also find_prefix
, which will match as much of its argument as possible, and return the value it finds there (if any) plus the remainder of the argument:
>>> t.find_prefix("theirs")
(None, 'irs') # Prefix "the" has no value
You could take a look at the source in nltk/containers.py
. The magic is in the methods __setitem__
and __getitem__
, which handle expressions of the form t[key]
.
Also good to know: The keys()
method will only return real entries, not prefixes. You can use it with the method subtrie
to retrieve all words that begin with a given prefix:
>>> t.subtrie('th').keys()
['ey']
PS. Note that containers.py
was removed from the NLTK about six months ago! Before you update your nltk distribution (which you should), save nltk/containers.py
under a different name. Better yet, just save the Trie
class. (The rest of the file is obsolete).
Upvotes: 4