g.revolution
g.revolution

Reputation: 12262

is using the java.util.UUID after hashing it with md5 a good option?

I saw this following code.

.
. // some code
.
String guid = NetworkUtil.md5(java.util.UUID.randomUUID().toString())
.
. // guid is being used 
.

Is this a good approach, hashing a Version 4 UUID with MD5?

According to UUID specification, UUID itself is very good in uniqueness of the generated UUID's and chances of collision are very very very small. So isn't this above piece of code actually reducing the quality of uniqueness by hashing it with MD5 which is an obsolete hashing mechanism now and prone to collisions and attacks.

Upvotes: 3

Views: 3479

Answers (2)

Stephen C
Stephen C

Reputation: 719239

Lets start with this:

According to UUID specification, UUID itself is very good in uniqueness of the generated UUID's and chances of collision are very very very small.

Actually, it doesn't say that. It can't say that, because that does not make sense.

In fact, if the UUID spec says anything about the uniqueness of type 4 UUIDs it would say that they are only as good as the source of random numbers. And that depends on the platform and the quality of the RNG & UUID implementations. If we can assume a perfect1 source of random numbers, then the probability of any two (separately generated) UUIDs being the same is on in 2122; i.e. very, very small. On the other hand, if you have a poor source of random numbers, the probability of pair-wise collision increases.

So isn't this above piece of code actually reducing the quality of uniqueness by hashing it with MD5 which is an obsolete hashing mechanism now and prone to collisions and attacks.

Yes. But MD5 isn't the real problem.

As @Doug Stevenson says, hashing a UUID does not reduce the chance of a collision. Not even for a hashing algorithm that has no known weakness. Whatever the algorithm, there is a chance that hashing UUIDs will increase the probability of collisions2.

So basically, there is no point in hashing a single UUID.

However, if you required a token that had a smaller probability of collision than a type 4 UUID, you could concatenate N type 4 UUIDs into a single byte array, and then create a hash for the array. If you have a (strong) M-bit hashing algorithm, and a perfect source of random numbers for your UUID generator, then the chance of collision should be roughly one in 2min(M, 122 * N).


1 - That is, a source of random bits where it is impossible for someone (an attacker) to predict the next bit in the sequence with anything other than 50% probability of being correct.

2 - This will happen if there are any two distinct UUIDs that have the same hash. That is possible even for a strong hashing algorithm ... unless you defined that to be a criteria by which you measure strength.

Upvotes: 6

Doug Stevenson
Doug Stevenson

Reputation: 317712

You can only make a probably-unique value worse by hashing it. It cannot get any better or "more unique". So, there is nothing to be gained by hashing like this, other than getting it into a uniform format that could be used where a md5-hashed string is required.

Upvotes: 3

Related Questions