archana29
archana29

Reputation: 5

top 3 word count in a text editor

I know one way to solve this question is to Hash the words and its corresponding word count. Then traverse the Hash map and figure out the top 3.

Is there any better way to solve this ? Will it be better if I use a BST instead of a HashMap ?

Upvotes: 0

Views: 347

Answers (4)

amit
amit

Reputation: 178511

Basically a histogram is the standard way of doing so, have your pick of which implementation you want to use for the histogram interface, the difference between them is actually instance specific - each has its advantages and disadvantages.

You might also want to consider a map-reduce design to get the words count:

map(doc):
   for each word:
     emitIntermediate(word,"1")

reduce(word,list<string>):
   emit(word,size(list))

This approach allows great scalability if you have a lot of documents - using the map-reduce interface, or elegant solution if you like functional programming.

Note: this approach is basically same as the hash solution, since the mapper is passing the (key,values) tuple using hashing.

Upvotes: 0

Emil Vikstr&#246;m
Emil Vikstr&#246;m

Reputation: 91983

A Trie is a good datastructure for this. No need for hash calculations and its time complexity for inserts and updates is O(1) in the size of the dictionary.

Upvotes: 1

tskuzzy
tskuzzy

Reputation: 36476

I would wager that a hash table would have better performance in this case since there are likely many different words. Lookups will take O(1) over O(log N).

Upvotes: 0

Alex Wilson
Alex Wilson

Reputation: 6740

Either a HashMap or a BST are a reasonable choice. Performance of each will vary depending upon the number of words you need to count over. A profiler is your friend in these instances (VisualVM is a reasonable choice to start with).

Upvotes: 0

Related Questions