Reputation: 11796
SHA-1 is considered safer than MD5 for at least two reasons: bigger hash (160 bits vs 128 bits) and better hash function.
I need to generate hashes on a few million strings. Generally the chance for a collision even for MD5 should be very low. I am aware that MD5 is pretty old and not considered secure in general, but in my case deliberate attacks are not a concern (no outside access, no incentive, etc.). I just need a reasonably safe hash function w/o wasting too many bits and 128 bits should be more than enough. So I was wondering - if I only got the first 128 bits of a SHA-1 hash, would that be better than the 128 bits of MD5? By "better" I mean "less likely to cause a collision".
Upvotes: 4
Views: 2355
Reputation: 11796
I ran a few "real-world" tests with 4,292,907 different strings. I used a 11-char long substring of the HEXed hash (in other words, a 44-bit portion). Example:
HASH: 629a09633488e9b2aaf2f5a606706da3
Test 1: 629a0963348
Test 2: 29a09633488
Test 3: 9a09633488e
...
Theoretically, I calculated the probability of collision to be ~41% (based on the "birthday paradox probability" formula). But that was theory, which assumes real random distribution. So I wanted to empirically test both MD5 and SHA-1 and see the results. Here they are (the numbers on the right show the number of collisions):
[MD5] [SHA-1]
Test No 1: 2 Test No 1: 0
Test No 2: 0 Test No 2: 0
Test No 3: 1 Test No 3: 0
Test No 4: 0 Test No 4: 1
Test No 5: 0 Test No 5: 0
Test No 6: 0 Test No 6: 1
Test No 7: 1 Test No 7: 0
Test No 8: 2 Test No 8: 0
Test No 9: 1 Test No 9: 0
Test No 10: 1 Test No 10: 0
Test No 11: 0 Test No 11: 1
Test No 12: 0 Test No 12: 1
Test No 13: 0 Test No 13: 0
Test No 14: 0 Test No 14: 1
Test No 15: 0 Test No 15: 1
Test No 16: 0 Test No 16: 1
Test No 17: 1 Test No 17: 1
Test No 18: 1 Test No 18: 1
Test No 19: 0 Test No 19: 0
Test No 20: 0 Test No 20: 1
TOTAL: 8 TOTAL: 10 // No of tests with at least 1 collision
Conclusion: Neither MD5 nor SHA-1 showed significantly worse probability of collision, compared to the "theoretical" one calculated via the "birthday paradox probability" formula. I am aware this test isn't perfect and should be taken with a grain of salt, but to me, at least, it shows that I can heavily rely on calculating the chance of collision via the "theoretical" formula w/o worrying that my calculations are too far from the truth.
Upvotes: 4