Reputation: 5
I know one way to solve this question is to Hash the words and its corresponding word count. Then traverse the Hash map and figure out the top 3.
Is there any better way to solve this ? Will it be better if I use a BST instead of a HashMap ?
Upvotes: 0
Views: 347
Reputation: 178511
Basically a histogram is the standard way of doing so, have your pick of which implementation you want to use for the histogram interface, the difference between them is actually instance specific - each has its advantages and disadvantages.
You might also want to consider a map-reduce design to get the words count:
map(doc):
for each word:
emitIntermediate(word,"1")
reduce(word,list<string>):
emit(word,size(list))
This approach allows great scalability if you have a lot of documents - using the map-reduce interface, or elegant solution if you like functional programming.
Note: this approach is basically same as the hash solution, since the mapper is passing the (key,values)
tuple using hashing.
Upvotes: 0
Reputation: 91983
A Trie is a good datastructure for this. No need for hash calculations and its time complexity for inserts and updates is O(1)
in the size of the dictionary.
Upvotes: 1
Reputation: 36476
I would wager that a hash table would have better performance in this case since there are likely many different words. Lookups will take O(1)
over O(log N)
.
Upvotes: 0
Reputation: 6740
Either a HashMap or a BST are a reasonable choice. Performance of each will vary depending upon the number of words you need to count over. A profiler is your friend in these instances (VisualVM is a reasonable choice to start with).
Upvotes: 0