Reputation: 2724
I have written a program to implement a basic Trie in C++, every node having 26 child pointers (for English alphabets), and the Node class looks like this:
class Node
{
public:
Node* parent;
Node* child[26];
unsigned int number_of_children;
....
}
Now, there could be many words like {snapple, dapple}, {distract, attract}, etc, in which more than 3 alphabets match. I want to store distinct entries of these sub-words (like from the example above - apple, tract), and let others point to them (like {s-n-ptr_to_apple, d-ptr_to_apple}, {d-i-s-ptr_to_tract, a-t-ptr_to_tract}). I believe it is best to handle this while inserting a word itself, instead of having a function which performs this after the insertions have completed.
I need some help in designing this, at present I am not looking into execution efficiency, rather the code/design should be compact. Currently, I visit a node and check all the non-null siblings (by traversing along the sibling's children) for matching with the input word, and then store the pointers in case there is a match of say 4 words (but the code is getting longer and obfuscating).
Upvotes: 1
Views: 1477
Reputation: 275575
Traditional tries compress on common prefixes. You, in essence, want to compress on common suffixes. The easiest way is to just build your trie entries backward.
Now, this means you must read the string backwards into the trie.
Upvotes: 2