Reputation: 23
Simply put I want to check if a specified word exists or not.
The lookup needs to be very fast which is why I decided to store the dictionary in a trie. So far so good! My trie works without issues. The problem is filling the trie with a dictionary. What I'm currently doing is looping through every line of a plain text file that is the dictionary and adding each word to my trie.
This is understandably so an extremely slow process. The file contains just about 120 000 lines. If anyone could point me in the right direction for what I could do it would be much appreciated!
This is how I add words to the trie (in Boo):
trie = Trie()
saol = Resources.Load("saol") as TextAsset
text = saol.text.Split(char('\n'))
for new_word in text:
trie.Add(new_word)
And this is my trie (in C#):
using System.Collections.Generic;
public class TrieNode {
public char letter;
public bool word;
public Dictionary<char, TrieNode> child;
public TrieNode(char letter) {
this.letter = letter;
this.word = false;
this.child = new Dictionary<char, TrieNode>();
}
}
public class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode(' ');
}
public void Add(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// if current letter is in child list, set current node and break loop
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
// if current letter is not in child list, add child node and set it as current node
if (!found_letter) {
TrieNode new_node = new TrieNode(letter);
if (c == word.Length) new_node.word = true;
node.child.Add(letter, new_node);
node = node.child[letter];
}
c ++;
}
}
public bool Find(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// check if current letter is in child list
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
if (found_letter && node.word && c == word.Length) return true;
else if (!found_letter) return false;
c ++;
}
return false;
}
}
Upvotes: 2
Views: 2560
Reputation: 2623
Anything you do with CLI yourself will be slower then using the built-in functions. 120k is not that much for a dictionary.
First thing I would do is fire up the code performance tool.
But just some wild guesses: You have a lot of function calls. Just starting with the Boo C# binding in a for loop. Try to pass the whole text block and tare it apart with C#.
Second, do not use a Dictionary. You waste just about as much resources with your code now as you would just using a Dictionary.
Third, sort the text before you go inserting - you can probably make some optimizations that way. Maybe just construct a suffix table.
Upvotes: 0
Reputation: 11531
Assuming that you don't have any serious implementation problems, pay the price for populating the trie. After you've populated the trie serialize it to a file. For future needs, just load the serialized version. That should be faster that reconstructing the trie.
-- ADDED --
Looking closely at your TrieNode
class, you may want to replacing the Dictionary
you used for child
with an array. You may consume more space, but have a faster lookup time.
Upvotes: 4