Jimmy Zelinskie
Jimmy Zelinskie

Reputation: 1575

What's the largest size a hashtable should be?

How large is too large for the average programming language's implementation of a hashtable?

Say I wanted to create a program that plays the game Shiritori. After the user inputs a word, the program needs to look up into a dictionary if that word exists. To prevent constant flat-file reads, is loading 100,000+ words into a hashtable at the program start a wise solution?

Upvotes: 4

Views: 14462

Answers (3)

Salvatore Previti
Salvatore Previti

Reputation: 9050

Well there are specialized data structures and algorithms for this kind of data. For example the Patricia Trie or the Radix Tree that is far more space efficient than an hash table for strings, but of course, being a tree, lookup computational complexity is O(log n) and building it is O(n log n). Since you are loding it from a file however you can write your file in such a way you can load it in O(n).

Hashtable (Dictionary) in C# is implemented in such a way it have not an upper bound except it uses internal 32 bit integer addressing (it cannot have more than 2 billions of items for sure).

100000 items are not too much for a dictionary. More problematic for languages with garbage collector maybe is that you will have 100000 allocated strings, some pressure for your GC. You can obtain more information on the real application memory footprint only running it.

If memory is a real concern, look for Patricia Trie and Radix Tree, perfect to store word dictionaries. But you can start using a dictionary and see how much memory your application get.

Doing a rough calculation, considering strings as unicode, and considering that the average word in english is 5.1 letters (i read on the web) and considering plus 32 bytes (for object and length) for each string you will get a minimum amount of memory of (100000 * (32 + 5 * 2)) memory for strings of 4200000 bytes, that is a really small amount.

Upvotes: 6

Dave
Dave

Reputation: 6179

Physical limitations (RAM) and implementation limitations (Java hash map vs C# hash map vs STL or Boost, etc) aside; I think that the upper limitation of the size of a hash table of what a hash map "should" be depends on the hashing algorithm. The original intention of a hash map is to achieve constant lookup times as the size of the collection grows . If you have a good hashing algorithm, then you can generate a unique key for a large amount of values; but if you have a bad hashing algorithm then you're lookup time goes to crap as you start to have collisions (ex two unique inputs into your hashing algorithm generate the same value) and you get into the trikery to avoid it.

But that shouldn't be what you're looking for. I'm just throwing this out there to add another point to the discussion I don't think has been addressed yet. I think you should look into @Salvatore Previti's answer. Given the problem you have the solution he mentioned seems to be a better fit.

Upvotes: 0

Michael Lorton
Michael Lorton

Reputation: 44376

"Too large"? That's like asking, "What is the best-tasting food?"

The larger the hashtable, the more memory it takes up, but the faster it runs. You have to decide which you need more, space or time.

Upvotes: -1

Related Questions