Confusion about finding information in a hash table when there is a collision

Question

I understand that if there is a collision in a hashtable you have a few options of storing the data. You could use some prime number to linearly traverse the array until you find a free spot. You could also rehash the entire table into a larger array. I'm sure there are other ways. What I don't understand is if there is a collision in the first place how would you know which row of data is the one you were looking for? Would I just not allow duplicate keys to be used?

Bernhard Barker · Accepted Answer

There's a big difference between a hash and a key (although they could sometimes be the same).

The key could be a very large number, a complex object consisting of many fields or anything really.
You apply your hash function to this key to get a hash.

So even if you disallow duplicate keys, you could still have duplicate hashes.

You often can't use your key as a hash directly because array indices are consecutive integers starting at 0, so it won't work if your key is too large, negative or not an integer, and you'll have to apply some sort of hash function.

If you want to store numbers between 1 and 10000, you would let the key be the number itself and could make the hash the remainder of the number divided by 1000 (and you'd thus have an array of size 1000 for the hash table).

Inserting 1001 will put it at index 1. If you try to insert 2001, it will also try to go to index 1 and you'll have a collision.

* The key could either be the entire value you want to store or only an identifier for it.

Confusion about finding information in a hash table when there is a collision

Answers (1)

Related Questions