Reputation: 8811
For example, if writing a dictionary class, collisions are rare, but they do exist. As a result, you need to store the key to make sure that when you find your key in the hashtable, it is the right one and not a collision.
Sometimes the keys are long and they're usually strings, so each key can be over 40 bytes, compared to if it was just a hash code. What if the key stored was the object hashed but using a slightly different hashing algorithm, with different prime numbers? Then the chances of a collision would be (1/(2^32)) * (1/(2^32))
.
You can even have another hashing algorithm and store that hash instead, so the chances of a collision would be (1/(2^32)) * (1/(2^32)) * (1/(2^32))
. Obviously a collision could STILL happen, but the chances are so low, and you save so much memory by only having to store a 4 bytes for a key instead of over 32 bytes.
I guess this is still not acceptable though, right, because there's still a CHANCE, but there's also a chance someone's RAM could accidentally flip a bit and bluescreen, and this just seems so unlikely it's tempting not to implement. Are there any alternatives or is the small chance still not worth it?
Upvotes: 1
Views: 302
Reputation: 2883
It depends.
Do you absolutely need to guarantee collision resolution? If so: you have to store the key, or something equivalent to it. In some cases (e.g. small keyspace, redundant data, etc.) you can use compression or custom hash functions to reversibly map the key to something smaller.
If not: yes, your approach would work. Note that, due to the birthday paradox, the probability of collision is:
There's a tradeoff: now you have to compute (and compare) several hashes to find items.
Following further down this path: why have a fixed number of hashes? You could compute one hash, and only compute the next if there's a collision; this leads to a trie-based implementation. (Of course, you then need a reliably-distributed family of hash functions...)
Most of this is overkill for all but the most high-performance and/or memory-constrained applications - but it is occasionally useful to do things like this :)
Upvotes: 1
Reputation: 184
If you want to make 100% sure there arent any collison there is no way arround checking for the key before inserting. That being said, we're lucky here because a well implemented dictionnary is exactly what you need in order to quickly find a key.
That being said, you might want to take a look at the function described here. The collision chances will be rather low
EDIT: Removed some non-sense I wrote about GUIDs...
Upvotes: 2