at54321
at54321

Reputation: 11796

Are 128 bits of SHA-1 hash safer than an MD5 hash?

SHA-1 is considered safer than MD5 for at least two reasons: bigger hash (160 bits vs 128 bits) and better hash function.

I need to generate hashes on a few million strings. Generally the chance for a collision even for MD5 should be very low. I am aware that MD5 is pretty old and not considered secure in general, but in my case deliberate attacks are not a concern (no outside access, no incentive, etc.). I just need a reasonably safe hash function w/o wasting too many bits and 128 bits should be more than enough. So I was wondering - if I only got the first 128 bits of a SHA-1 hash, would that be better than the 128 bits of MD5? By "better" I mean "less likely to cause a collision".

Upvotes: 4

Views: 2355

Answers (1)

at54321
at54321

Reputation: 11796

I ran a few "real-world" tests with 4,292,907 different strings. I used a 11-char long substring of the HEXed hash (in other words, a 44-bit portion). Example:

  HASH: 629a09633488e9b2aaf2f5a606706da3
Test 1: 629a0963348
Test 2:  29a09633488
Test 3:   9a09633488e
...

Theoretically, I calculated the probability of collision to be ~41% (based on the "birthday paradox probability" formula). But that was theory, which assumes real random distribution. So I wanted to empirically test both MD5 and SHA-1 and see the results. Here they are (the numbers on the right show the number of collisions):

[MD5]             [SHA-1]
Test No  1: 2     Test No  1: 0
Test No  2: 0     Test No  2: 0
Test No  3: 1     Test No  3: 0
Test No  4: 0     Test No  4: 1
Test No  5: 0     Test No  5: 0
Test No  6: 0     Test No  6: 1
Test No  7: 1     Test No  7: 0
Test No  8: 2     Test No  8: 0
Test No  9: 1     Test No  9: 0
Test No 10: 1     Test No 10: 0
Test No 11: 0     Test No 11: 1
Test No 12: 0     Test No 12: 1
Test No 13: 0     Test No 13: 0
Test No 14: 0     Test No 14: 1
Test No 15: 0     Test No 15: 1
Test No 16: 0     Test No 16: 1
Test No 17: 1     Test No 17: 1
Test No 18: 1     Test No 18: 1
Test No 19: 0     Test No 19: 0
Test No 20: 0     Test No 20: 1
TOTAL: 8          TOTAL: 10           // No of tests with at least 1 collision

Conclusion: Neither MD5 nor SHA-1 showed significantly worse probability of collision, compared to the "theoretical" one calculated via the "birthday paradox probability" formula. I am aware this test isn't perfect and should be taken with a grain of salt, but to me, at least, it shows that I can heavily rely on calculating the chance of collision via the "theoretical" formula w/o worrying that my calculations are too far from the truth.

Upvotes: 4

Related Questions