Reputation: 17745
I searched for this before I post but I couldn't fine something that helps me. I'm using java. I have a file of 300.000 words (already sorted alphabeticaly). I want to load these words in a structure and search if a word that I'll pass exists or not. I want something that is best for string search. I've seen about tries (suffix trees) and red-black trees (TreeSet - since I want only keys, and no values- in java).
Please if you consider answering provide some explanation about the efficiency of your proposition. Thank you.
EDIT The structure will be created by loading the file, and there will be no further adding of words. Case sensitivity is not required. I didn't know what stemming is. I know now, but I don't know if it would help. The file is a dictionary (no translation, just the words of a given language).
Upvotes: 0
Views: 337
Reputation: 96
The Hash would be your optimal solution. It searches in constant time as oposed to the treeset wich is log(n) time.
You can also store in constant time, if you declare the set big enough at creation.
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html
The creation would be in time: n , and you will need to have the sorted set contained in a seperate structure.
This is a solution optimized for searching for duplicates not for memory or adding data.
Upvotes: 2