Best data structure to count letter frequencies?

Question

Task:

What is the most common first letter found in all the words in this document?

-unweighted (count a word once regardless of how many times it shows up)

-weighted (count a word separately for each time it shows up)

What is the most common word of a given length in this document?

I'm thinking of using a hashmap to count the most common first letter. But should I use a hashmap for both the unweighted and weighted?

And for most common word of a given length(ex. 5) could I use something more simple like an array list?

Jim Mischel · Accepted Answer

For the unweighted, you need a hash table to keep track of the words you've already seen, as well as a hash map to count the occurrences of the first letter. That is, you need to write:

if words_seen does not contain word
    add word to words seen
    update hash map with first letter of word
end-if

For the weighted, you don't need that hash table, because you don't care how many times the word occurs. So you can just write:

update hash map with first letter of word

For the most common words, you need a hash map to keep track of all the unique words you see, and the number of times you see the word. After you've scanned the entire document, make a pass through that hash map to determine the most frequent one with the desired length.

You probably don't want to use an array list for the last task, because you want to count occurrences. If you used an array list then after scanning the entire document you'd have to sort that list and count frequencies. That would take more memory and more time than just using the hash map.

Best data structure to count letter frequencies?

Answers (1)

Related Questions