Warlock
Warlock

Reputation: 7471

Is a GetHashCode() value unique or not for a string object?

Could anybody help me to understand, how the GetHashCode() method works for string values?

From MSDN I found:

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

So, different strings can return the same hash code, hashcode is not unique for strings. Can it lead to bugs in the core of a program?

Upvotes: 2

Views: 963

Answers (5)

Mene
Mene

Reputation: 3799

Well, this can lead to bugs, if your algorithms rely on each string to have a unique hash-value.

For example a hash map (Dictionary in .NET) might fail on collisions (i.e. two objects with the same hash that are not equal), or it doesn't if it handles collisions, that depends on the exact implementation. Fail in that case means: If you add a new object to the map and there is already an object in the map that has the same hash value as the new object, then the new object will override the old one instead of just being added. As far as I know the Dictionary class in .NET can handle collisions though.

If you need more concrete advice you'll need to ask a more concrete question: what are you trying to archive, how are you planning to do it etc.

As a side-note: hash values for strings are usually not unique as the size of a hash value is limited, whereas the length of a string is not. Think of it like this: Say the hash function is MD5 (it's not the default in .Net) and you have a string both consisting of hex-dec chars (0-9A-Z) and the string is 200 characters long: there are 200^16 possible values for the string, but only 32^16 possible values for its hash-value.

Upvotes: 3

Joey
Joey

Reputation: 354356

The documentation is pretty accurate on the guarantees the method makes, actually. The hash code just follows the following two rules (a == b refers to a.Equals(b), #a refers to a.GetHashCode() for readability reasons):

  • If a == b Then #a == #b
  • If #a != #b Then a != b

Note that this is not an equivalence between Equals and a matching hash. If you rely on more than that, well, then your code has a bug, obviously. GetHashCode is intended for using objects as keys in dictionaries so that there is a quick mapping from objects to a number, but it doesn't need to be reversible. If you look at strings you can easily see that the number of possible strings quickly surpasses the number of possible hash codes, so you could have answered that question on your own. You're beyond 232 possible strings at a little more than two characters already.

Upvotes: 2

Piotr Perak
Piotr Perak

Reputation: 11088

Hashcode is used to speed up searching of objects in Hash collections. Internally they store objects in many buckets. Objects being held are divided into buckets based on their hashcode. So when you call for example

var value = Dictionary["someKey"]

dictionary instead of searching in all internal buckets goes directly to the bucket that should contain value under that key. And dictionary searches only in that bucket.

Maybe this isn't exactly how it's implemented but it should be it more or less. So in this case it doesn't matter that different keys in dictionary have same hashcode. That only means that values under that keys will end up in same bucket.

Upvotes: 2

Reed Copsey
Reed Copsey

Reputation: 564323

So, different strings can return the same hash code, hashcode is not unique for strings. Can it lead to bugs in the core of a program?

It should not lead to bugs, provided the hash values are used as intended. Hashes returned by GetHashCode() are not intended to provide unique hashes - this would be impossible, as there are only about 4 billion possible hash codes (as that method returns an Int32), but an infinite number of possible strings.

Hashes are meant to provide few collections, not no collisions. As such, you should never assume that a hash is a unique representation based on value. The only guarantee you get is that two different hash codes for two different strings mean that the strings are not equal, as two equal values should always produce the same hash. Two hash codes that are equal, however, do not necessarily mean the two string values are equal.

Upvotes: 2

David Schwartz
David Schwartz

Reputation: 182743

It can lead to bugs if you assume that a matching hash code means a matching string. Typically, you use the hash code to sort strings into buckets for quick searching and selection. If you detect that two strings have a matching hash code, you would then typically compare the strings themselves for equality.

If that doesn't answer your question, then I don't understand the question.

Upvotes: 3

Related Questions