Reputation: 944
So I understand that there is proof that MD5 can not guarantee uniqueness as there are more strings in the universe than there are MD5 hash strings, but is there any inverse proof for a finite number of strings?
Basically, if I have strings of maximum length of X, is there an X for which MD5 is guaranteed to be unique? if yes, then what is that X? and if there are more than one values for X, what is the maximum value of X?
or is there such an X for any other hashing algorithm, SHA-1, etc.?
Upvotes: 11
Views: 3673
Reputation: 373452
The answer to your question is yes. For any hash function, there is a maximum length X for which you will get back unique strings. Finding X, though, could be very hard. The idea is to run this program:
X= 0;
For i = 0 onward
For all strings of length i
Compute the hash code of that string.
If a collision is found, return X.
X = i
The idea is to just list longer and longer strings until you find a hash collision. Eventually you'll have to, since eventually you'll have generated more strings than there are possible hash outputs.
On expectation, assuming that a hash function is actually pretty random, you'll need to generate O(√U) different strings before you find a collision, where U is the size of the space to which the hash function maps. For 256-bit hashes, this is 2256. This means that in practice the above program would never actually terminate unless the hash function was pretty broken, but in theory it means that your number X exists.
Hope this helps!
Upvotes: 1
Reputation: 4042
Summarizing the excellent answers here: What's the shortest pair of strings that causes an MD5 collision?
The shortest known attack on MD5 requires 2 input blocks, that is, 128 bytes or 1024 bits.
For any hash algorithm that outputs N bits, assuming it distributes inputs approximately randomly, you can assume a collision is more than 50% likely in about sqrt(2^N)
inputs. For example, MD5 hashes to 128 bits, so you can expect a collision among all 64 bit inputs. This assumes a uniformly random hash. Any weaknesses would reduce the number of inputs before a collision can be expected to occur.
Upvotes: 3