Reputation:
Although I am aware that two different Strings can return the same hashcode, I have been unable to find anything about two of differing lengths doing so. Is this possible, and if so, examples would be appreciated. This is using the java hashcode function, in case that changes anything.
Upvotes: 2
Views: 1611
Reputation: 3486
Hashcodes are distributed over the space of an int
. The are only 2^32 = ~4 billion
possible values for an int
. There are well more than that number possible strings, so by the pigeonhole principle, there must exist multiple strings with the same hash codes.
However, this does not prove different length strings might have the same hash code, as pointed out below. Java uses the formula s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
for hashing strings. Knowing this, it is easy to construct strings of different length that have the same hash code:
Let String s1 = "\001!";
and String s2 = "@";
. Then s1.length() != s2.length()
but s1.hashCode() == '\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64
.
However, let me again say that there are over 4 billion possible values of an int
, so your probability of collision is low, although not as low as you might think, because of the Birthday Paradox, which gives that you have about a 50% chance of a collision after about 77K hashes (assuming hashes are randomly distributed, which really depends on your data - if you mostly deal with very small length strings you will have more frequent collisions). Every data structure that uses hashing deals must deal with collisions, though (e.g. a common way is to use linked lists at each hash position), or deal with loss of data (e.g. in a bloom filter).
Upvotes: 3
Reputation: 183201
Yes, this can happen.
Some rather trivial examples:
"foo"
, "\0foo"
, "\0\0foo"
, etc., all have the same hash-code.new String(new char[] { 12, 13 })
has the same hash-code as the single-character new String(new char[] { 12 * 31 + 13 })
(where I selected 12
and 13
arbitrarily; the same works for any other values, as long as the 12 * 31 + 13
analogue stays within the two-byte-unsigned-integer range).But those are just some easy-to-construct examples. There are also plenty of pairs of strings that just happen to work out to have the same hash-code, despite no obvious relationship between them.
Upvotes: 2