Reputation: 17176
I'm not quite understand why does Object.GetHashCode()
return different values for two identical byte arrays, but returns equal values for not IEnumerable
value type objects. For example:
byte e = 123;
Console.WriteLine(e.GetHashCode());
byte f = 123;
Console.WriteLine(f.GetHashCode());
output is
123
123
but when
byte[] a = new byte[3] { 1, 2, 3 };
Console.WriteLine(a.GetHashCode());
byte[] b = new byte[3] { 1, 2, 3 };
Console.WriteLine(b.GetHashCode());
output is
46104728
12289376
Why is it so, and how can I quickly compare two huge arrays without comparing their every element?
Upvotes: 0
Views: 1992
Reputation: 420
Try by use SHA1CryptoServiceProvider.ComputeHash method? It takes a byte array and returns a SHA1 hash which is identical for byte arrays. Performance is good.string byte1hash; string byte2hash;
using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider()) { byte1hash= Convert.ToBase64String(sha1.ComputeHash(byteArray1)); byte2hash= Convert.ToBase64String(sha1.ComputeHash(byteArray2));
} if (string.Equals(byte1hash, byte2hash)) { //States the byte arrays are same.. }If you are not worried about security, then you go for MD5
Upvotes: 1
Reputation: 42343
GetHashCode
is not defined for array types - you have to implement your own hash algorithm.
The value you see is actually based on the underlying reference and so two identical arrays will always have different hash codes, unless they are the same reference.
For integral types 32-bits or less, the hash code is equal to the value as converted to a 32-bit integer. For the 64 bit integral type, Int64
, the upper 32 bits are XORed with the lower 32 bits (there's a shift in there also) for the hash code.
So when it comes to trying to compare two arrays 'quickly', you have to do it yourself.
You can use logic checks first - lengths are equal, start and end with the same byte value etc. Then you have a choice - either read byte - by - byte and compare values (or you can read 4 or 8 bytes at a time and use the BitConverter
to convert blocks of bytes to Int32
or Int64
to make a single pair of values that might be quicker to check for equality) or use a general-purpose hash function to get a good guess of equality.
For this purpose you can use an MD5 hash - it's very quick to output a hash with MD5: How do I generate a hashcode from a byte array in C#?.
Getting two identical hash values from such a hash function does not guarantee equality, but in general if you are comparing arrays of bytes within the same data 'space' you shouldn't get a collision. By that I mean that, in general, examples of different data of the same type should nearly always produce different hashes. There's a lot more around the net on this than I am qualified to explain.
Upvotes: 4
Reputation: 7934
For reference type by default GetHashCode is calculating hash code from reference and not from content of the object.
I think you out of luck, to calculate hashcode of array you need to go over a content of the array at-least once
Upvotes: 0