F.M.F.
F.M.F.

Reputation: 2320

Hash-collision: Chance growing with multiple hashing

Is the chance of hash-collisions growing, when you hash the object multiple times?

Meaning, is the chance of collisions higher for hash(hash(object)) than for hash(object)?

Upvotes: 1

Views: 741

Answers (1)

Dakkaron
Dakkaron

Reputation: 6276

Depends on what exactly you mean.

If the hash changes due to the rehashing, then yes, if it doesn't, then no.

If the object did not change and you rehash it, it will keep the same hash. So for example, the md5 hash of the string teststring will always be D67C5CBF5B01C9F91932E3B8DEF5E5F8.

But if the object changed and you rehash because of that, you will get a new hash.

Now, if you rehash an object that has changed, there might be a higher chance of collisions.

Say for example you have a very simple object containing only one integer value and a very simple hashing algorithm which just takes this value and does a modulo 20 on it. This is an intentionally bad hashing algorithm for this example only.

Now say you have two objects containing a random number. The chance of a hash collision for these two values is 1/20, because you have 20 buckets in the hashing algorithm.

If you now rehash, you once again have a chance of 1/20 chance for a collision, or a 19/20 chance for no collision.

So the chance for no collision after n rehashes is (19/20)^(n+1). So after the first rehash (so you have your original values and rehash one of the values once after it changed) you have a 90.25% chance that you don't have a collision. After the second rehash you are down to 85.76% chance that you don't have any collisions. After 100 rehashes you are down to only a 0.59% chance of no collisions.

That is all depending on that the values change to a new value before each rehash.

Another way to prove that is this:

  • Hashing algorithms give you a limited amount of buckets (=different possible hashes)
  • You can feed your hashing algorithm with an infinite amount of different values
  • Each value needs to be mapped to a bucket
  • If you have an infinite amount of values mapped to a finite amount of buckets, there will be collisions at some time.

Upvotes: 2

Related Questions