Reputation: 5246
So, I have a dictionary-file with 70,000 lines which are ordered alphabetically. Each line is a separate word with translation. What would you recommend as the best practice for searching in such a file? I'm thinking about indexation of file but may be there're better ways.
Upvotes: 1
Views: 1515
Reputation: 200296
First of all, memory-map it using Java NIO's memory-mapped file support. Second, pre-process it to find all offsets at which a new entry starts. Finally, write some binary-search code that will find an entry. I think this could amount to the most lightweight and memory-efficient solution.
Lucene also employs skip-lists: you can additionally cache in-memory every 16th (or so) entry and use that in the first phase of the binary search. Then you'll have to go to the actual file only to zero in on the exact entry.
Upvotes: 3