Reputation: 3814
Best way to implement a
When asked in a one hour interview, are we expected to write a c/c++ code, for the algorithm?
Upvotes: 11
Views: 10282
Reputation: 10828
I can see no better data structure than a trie for the dictionary and the thesaurus. Both can be fitted in one structure if needed with one link in the node pointing to the meaning of the word (dictionary) and one to synonyms (thesaurus). It can even offer some form of autocompletion (when there's only one link in the node).
Spelling corrector is a bit trickier - since one has to map fals input to some kind of correct input. You can take this link as a start: http://en.wikipedia.org/wiki/Spell_checker. At the end you'll find links to papers about different algorithms. According to the wikipedia article, this paper describes the most successfull algorithm: Andrew Golding and Dan Roth's "Winnow-based spelling correction algorithm"
Upvotes: 2
Reputation: 147
For the dictionary, there is indeed a data structure superior to the trie. Try a DAWG, or CDAWG: http://en.wikipedia.org/wiki/Directed_acyclic_word_graph. Just to complicate matters, my favorite paper on the structure, Ciura and Deorowicz's "How to Squeeze a Lexicon" calls them "minimal ADFAs". Google around and you'll find many competing algorithms for constructing these beasts. Good luck!
Upvotes: 4
Reputation: 101149
In all three cases, you can construct a BK-Tree from your word set. BK-Trees let you find all words within a given edit distance of the entered word. See my blog post on BK-Trees for an explanation of how they work.
A dictionary and spell checker are more or less identical - the dictionary simply needs to provide definitions along with the words. For a thesaurus, words are clustered into 'synsets' - synonym sets - with all the elements inserted into the BK-Tree. When someone looks up one word in the synset, you display all the other ones as alternatives. A word can appear in multiple synsets, so you need to ensure your BK-Tree nodes can handle multiple values for a given key.
Upvotes: 1
Reputation: 14446
For a dictionary I would use the std::map
(calling Dictionary
in the .Net framework) collection with the word as key and a custom object (with all information about the word + the definition) as value.
For a thesaurus the best structure is a tree where each node is a section and where each branch finished with an object which contains all information about what you have to display.
Upvotes: 1
Reputation: 115897
See this for a 21-line Python 2.5 spelling corrector and a bit of background.
Upvotes: 6