congsg2014
congsg2014

Reputation: 75

is the int value of String.hashCode() unique?

I encountered a problem days ago.Now i have tens of millions of words,type of string. now i decide to keep them in database and use index to keep them unique.And i do not want to compare the original words to keep them unique. I would like to make sure whether the hashCode() method of a string can be unique , will it not be changed if a use another laptop or different time or something like that?

Upvotes: 6

Views: 15654

Answers (3)

Abhishek Gharai
Abhishek Gharai

Reputation: 237

No,

Because a string in java can have maximum 2,147,483,647 (2^31 - 1) no of characters and all characters will vary so it will produce a very large no of combinations, but integer have only a range from -2,147,483,648 to 2,147,483,648. So it is impossible, and using this method the hash code of a string is computed

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1].

Example :

If you create two string variables as "FB" and "Ea" there hash code will be same.

Upvotes: 10

paxdiablo
paxdiablo

Reputation: 881093

Unique, no. By nature, hash values are not guaranteed to be unique.

Any system with an arbitrarily large number of possible inputs and a limited number of outputs will have collisions.

So, you won't be able to use a unique database key to store them if it's based only on the hash code. You can, however, use a non-unique key to store them.

In reply to your second question about whether different versions of Java will generate different hash codes for the same string, no.

Provided a Java implementation follows the Oracle documentation (otherwise it's not really a Java implementation), it will be consistent across all implementations. The Oracle docs for String.hashCode specify a fixed formula for calculation the hash:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

You may want to check this is still the case if you're using wildly disparate versions of Java (such as 1.2 vs 8) but it's been like that for a long time, at least since 1.5.

Upvotes: 11

Manjunath
Manjunath

Reputation: 1685

Below is the hashCode computation of a String which a JVM does. As stated it purely calculates based on the individual character and its position in the String and there is nothing which is dependent on JVM or the machine type which runs the JVM which would alter the hashcode.

This is also one of the reason why String class is declared final (not extensible leading to immutability) so that no one alters its behaviour.

Below is as per spec:-

public int hashCode()

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

Upvotes: 8

Related Questions