daniels
daniels

Reputation: 19193

Why in some cases are used only the first x chars of a md5 hash instead of using all of them?

For example commit list on GitHub shows only first 10, or this line from tornadoweb which uses only 5

return static_url_prefix + path + "?v=" + hashes[abs_path][:5]

Are only the first 5 chars enough to make sure that 2 different hashes for 2 different files won't collide?

LE: The example above from tornadoweb uses md5 hash for generating a query sting for static file caching.

Upvotes: 2

Views: 699

Answers (1)

Seth
Seth

Reputation: 772

In general, No.

In fact, even if a full MD5 hash were given, it wouldn't be enough to prevent malicious users from generating collisions---MD5 is broken. Even with a better hash function, five characters is not enough.

But sometimes you can get away with it.

I'm not sure exactly what the context of the specific example you provided is. However, to answer your more general question, if there aren't bad guys actively trying to cause collisions, than using part of the hash is probably okay. In particular, given 5 hex characters (20 bits), you won't expect collisions before around 2^(20/2) = 2^10 ~ one thousand values are hashed. This is a consequence of the the Birthday paradox.

The previous paragraph assumes the hash function is essentially random. This is not an assumption anyone trying to make a cryptographically secure system should make. But as long as no one is intentionally trying to create collisions, it's a reasonable heuristic.

Upvotes: 4

Related Questions