Reputation: 2320
Is the chance of hash-collisions growing, when you hash the object multiple times?
Meaning, is the chance of collisions higher for hash(hash(object))
than for hash(object)
?
Upvotes: 1
Views: 741
Reputation: 6276
Depends on what exactly you mean.
If the hash changes due to the rehashing, then yes, if it doesn't, then no.
If the object did not change and you rehash it, it will keep the same hash. So for example, the md5 hash of the string teststring
will always be D67C5CBF5B01C9F91932E3B8DEF5E5F8
.
But if the object changed and you rehash because of that, you will get a new hash.
Now, if you rehash an object that has changed, there might be a higher chance of collisions.
Say for example you have a very simple object containing only one integer value and a very simple hashing algorithm which just takes this value and does a modulo 20
on it. This is an intentionally bad hashing algorithm for this example only.
Now say you have two objects containing a random number. The chance of a hash collision for these two values is 1/20
, because you have 20 buckets in the hashing algorithm.
If you now rehash, you once again have a chance of 1/20
chance for a collision, or a 19/20
chance for no collision.
So the chance for no collision after n
rehashes is (19/20)^(n+1)
. So after the first rehash (so you have your original values and rehash one of the values once after it changed) you have a 90.25%
chance that you don't have a collision. After the second rehash you are down to 85.76%
chance that you don't have any collisions. After 100 rehashes you are down to only a 0.59%
chance of no collisions.
That is all depending on that the values change to a new value before each rehash.
Another way to prove that is this:
Upvotes: 2