MaYaN
MaYaN

Reputation: 6996

Is creating a Guid out of an MD5 hash instead of String valid?

I am trying to implement a method for detecting duplicate files. I have an MD5 hashing method (let's ignore the fact that MD5 is broken) as below:

using(MD5 hasher = MD5.Create())
using(FileStream fs = File.OpenRead("SomeFile"))
{
    byte[] hashBytes = hasher.ComputeHash(fs);
    string hashString = string.Join(string.Empty, hashBytes.Select(x => x.ToString("X2"))); 
}

Instead of creating a string out of the hashBytes can I simply create a Guid out of it like so?

Guid hashGuid = new Guid(hashBytes);

Would it still be valid or will I lose uniqueness?

Upvotes: 6

Views: 10703

Answers (3)

Theodor Zoulias
Theodor Zoulias

Reputation: 43812

Guids can guarantee uniqueness only if they are generated properly by calling Guid.NewGuid(). By constructing Guids from MD5 bytes you gain zero uniqueness. You only store your bytes in a data structure named "Globally Unique IDentifier", that could potentially be not unique.

Do this experiment: create two Guids using the same byte array for both. Do you expect the Guids to be different or equal?

Upvotes: -3

Marc Gravell
Marc Gravell

Reputation: 1063619

MD5 hashes and Guid essentially both express 128 bits of binary, so:

  • plus: you won't lose any uniqueness
  • plus: the fact that Guid is a value-type means that you avoids an allocation compared to string...
  • minus: ... but if you're going to display it anywhere, you might actually end up allocating multiple strings (i.e. rendering the same Guid multiple times)
  • minus: there is a semantic meaning to Guid that won't really be respected/expected here
  • minus: Guid default formatting isn't the same as how MD5 hashes are usually expressed
  • minus: Guid endianness is a mess, so if you want to get between raw bytes and any text representation: tread very carefully; it is not what you expect

Upvotes: 10

Can Poyrazoğlu
Can Poyrazoğlu

Reputation: 34810

Not sure if it's the best idea, but since both values are 128-bits, you wouldn't be losing any data, assuming that you aren't trying to convert textual representation of the MD5.

Just convert MD5 bytes directly to GUID, without converting it to a string first.

Upvotes: 1

Related Questions