What happens when hash tables generate huge hashcodes?

Question

I'm a little confused about hashmaps/hashtables, because everywhere I look, I don't really get the answer im looking for. I still don't know if hashmaps use a static array or if they use dynamic lookup. This is how I thought hashmaps were stored at first:

[Bucket 0] Object 1
[Bucket 4] Object 2
[Bucket 8000] Object 3

I thought that if you wanted to get the value from bucket 8000 you would have to search through every bucket key before it until you reached 8000. Now after reading and watching some videos Im thinking its stored as a basic array like:

[Bucket 0] Object 1
[...3 buckets in between]
[Bucket 4] Object 2
[...7795 buckets in between]
[Bucket 8000] Object 2

But then when something generated huge hash codes it would waste tons of memory. Then somewhere learned that maybe it uses the % operator to keep them from using indexes too big, but then each bucket would be more likely to have a collision. So which is it? I would think the most sensible one would be idea number one but I'm not sure. Thanks.

arshajii · Accepted Answer

The hash code is taken modulo the array length to generate an in-bounds index (HashMap in particular first applies a secondary hash function to the given hash code to protect against poorly written hash functions). If the hash table begins to fill up past a certain threshold, the array is expanded and all of the elements are re-inserted. Collisions are possible, but with a good hash function they should be rare.

What happens when hash tables generate huge hashcodes?

Answers (1)

Related Questions