Reputation: 1499
I am looking for the most efficient way to match an NSString against about 200,000 other strings in the minimum amount of time. Basically I need to check, every once in a while, whether or not a word that is inputted is part of the English language.
What are the ways I could do this? I've heard about hashtables-- is that the best way?
EDIT: benchmarks for dictionary initialization to memory:
NSDate *start = [NSDate date];
NSLog(@"initWords");
//temporary
NSString *_filePath = [[NSBundle mainBundle] pathForResource:kFILE_NAME_FOR_DICTIONARY ofType:@"txt"];
NSLog(@"%@",_filePath);
NSString *_fileContents = [[NSString alloc] init];
NSData *_binary = [NSData dataWithContentsOfFile:_filePath];
if (_binary) {
_fileContents = [[NSString alloc] initWithData:_binary encoding:NSUTF8StringEncoding];
} else {
NSLog(@"file parse error: did you forget to add the file? Please add file to:\n\n\n\n%@\n\n\n\n",[[NSBundle mainBundle] resourcePath]);
}
NSArray *_wordList = [_fileContents componentsSeparatedByString:kNEW_LINE_CHAR];
englishDictionary = [[NSMutableSet alloc] init];
[englishDictionary addObjectsFromArray:_wordList];
NSLog(@"Word count:\t%d",englishDictionary.count);
NSLog(@"Time to init dictionary:\t%f",[start timeIntervalSinceNow]*-1.);
iphone 5: 1.089725 (seconds)
ipad 1: 3.082753
iphone 4: 3.582853
Benchmark (time to test word if it is in the dictionary or not):
-(BOOL)checkWord:(NSString *)word{
NSDate *start = [NSDate date];
BOOL yesNoMaybeSo = [englishDictionary containsObject:word];
NSLog(@"Time to check word:\t%f",[start timeIntervalSinceNow]*-1.);
return yesNoMaybeSo;
}
iphone 5: 0.000021 (seconds)
ipad 1: 0.000037
iphone 4: 0.000043
Upvotes: 0
Views: 345
Reputation: 56059
If you don't mind rolling your own, you can build a trie (a.k.a prefix tree) from the dictionary and use it to look up the words. I'd bet a reasonably good implementation using c strings and structs would be faster and more memory-efficient than an NSDictionary.
Upvotes: 0
Reputation: 14160
With such requirements standard methods/classes will not fit your needs. You should learn/read some books on algorithms to implement this correctly. Looks like you should use hash tables, but, again, with 200K values NSDictionary (which is hash table) probably will not work fast enough.
Upvotes: 1
Reputation: 108101
The most efficient way is probably to use a NSSet
to store all the words you want to compare against your string.
Then you simply check whether your word belongs to the set like follows
BOOL englishWord = [theEnglishSet containsObject:yourString];
This will take constant time to perform.
Upvotes: 2