Reputation: 6996
I am trying to implement a method for detecting duplicate files. I have an MD5 hashing method (let's ignore the fact that MD5 is broken) as below:
using(MD5 hasher = MD5.Create())
using(FileStream fs = File.OpenRead("SomeFile"))
{
byte[] hashBytes = hasher.ComputeHash(fs);
string hashString = string.Join(string.Empty, hashBytes.Select(x => x.ToString("X2")));
}
Instead of creating a string
out of the hashBytes
can I simply create a Guid
out of it like so?
Guid hashGuid = new Guid(hashBytes);
Would it still be valid or will I lose uniqueness?
Upvotes: 6
Views: 10703
Reputation: 43812
Guids can guarantee uniqueness only if they are generated properly by calling Guid.NewGuid()
. By constructing Guids from MD5 bytes you gain zero uniqueness. You only store your bytes in a data structure named "Globally Unique IDentifier", that could potentially be not unique.
Do this experiment: create two Guids using the same byte array for both. Do you expect the Guids to be different or equal?
Upvotes: -3
Reputation: 1063619
MD5 hashes and Guid
essentially both express 128 bits of binary, so:
Guid
is a value-type means that you avoids an allocation compared to string
...Guid
multiple times)Guid
that won't really be respected/expected hereGuid
default formatting isn't the same as how MD5 hashes are usually expressedGuid
endianness is a mess, so if you want to get between raw bytes and any text representation: tread very carefully; it is not what you expectUpvotes: 10
Reputation: 34810
Not sure if it's the best idea, but since both values are 128-bits, you wouldn't be losing any data, assuming that you aren't trying to convert textual representation of the MD5.
Just convert MD5 bytes directly to GUID, without converting it to a string
first.
Upvotes: 1