Reputation: 2371
The code of the equals
method in class String is
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
}
I have a question - why does this method not use hashCode() ?
As far as I know, hashCode() can compare two strings rapidly.
UPDATE: I know, that two unequal strings, can have same hashes. But two equal strings have equal hashes. So, by using hashCode(), we can immediately see that two strings are unequal.
I'm simply thinking that using hashCode() can be a good filter in equals
.
UPDATE 2: Here some code, about we are talking here.
It is an example how String method equals can look like
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
if (hashCode() == anotherString.hashCode()){
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}else{
return false;
}
}
return false;
}
Upvotes: 46
Views: 6221
Reputation: 328649
This question has actually been considered by the developers of the JDK. I could not find in the various messages why it has not been included. The enhancement is also listed in the bug database.
Namely, one of the proposed change is:
public boolean equals(Object anObject) {
if (this == anObject) // 1st check identitiy
return true;
if (anObject instanceof String) { // 2nd check type
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) { // 3rd check lengths
if (n != 0) { // 4th avoid loading registers from members if length == 0
int h1 = hash, h2 = anotherString.hash;
if (h1 != 0 && h2 != 0 && h1 != h2) // 5th check the hashes
return false;
There was also a discussion to use ==
for interned strings (i.e. if both strings are interned: if (this != anotherString) return false;
).
Upvotes: 16
Reputation: 52538
If the hash code takes the whole content of the string into account, then calculating the hash code of a string with n characters takes n operations. For long strings that's a lot. Comparing two strings takes n operations if they are the same, not longer than calculating the hash. But if the strings are different, then a difference is likely to be found a lot earlier.
String hash functions usually don't consider all characters for very long strings. In that case, if I compare two strings I could first compare the characters used by the hash function, and I'm at least as fast as checking the hashes. But if there is no difference in these characters, then the hash value will be the same, so I have to compare the full strings anyway.
Summary: A good string comparison is never slower but often a lot faster than comparing the hashes (and comparing strings when the hashes match).
Upvotes: 0
Reputation: 1290
As I think, hashCode() can make comparison of two strings quicker.
Arguments?
Arguments against this proposal:
hashcode()
from String
has to access every character in the String and has to do 2
calculations for every character.
So we need for a string with n
characters 5*n
operations (load, multiplication, lookup/load, multiplication, store). Two times, because we compare two Strings. (Ok, one store and one load does not really count in a reasonable implementation.)
For the best case, this makes a total of 10*x
operations for two strings with length m
and n
and x=min(m,n)
. Worst case is 10*x
with x=m=n
. Average somewhere between, perhaps (m*n)/2
.
The current equals implementation needs in the best case 3
operations. 2
loads, 1
compare. Worst is 3*x
operations for two strings with length m
and n
and x=m=n
. Average is somewhere between, perhaps 3*(m*n)/2
.
We have to analyze usage patterns. It could be that most of the time, we will only ask one time for equals, not multiple times. Even if we ask multiple times, it could not be enough to have time savings from the caching.
Not direct against the speed, but still good counterarguments:
We do not expect a hashcode in equals, because we know for sure that hash(a)==hash(b)
for some a!=b
. Everyone reading this (and knowledge about hashing) will wonder what is happening there.
I can already see the next question on SO: "I have a String with some billion times 'a'. Why does it take forever to compare it with equal() against 'b'?" :)
Upvotes: 0
Reputation: 45443
This can be a good idea for many use cases.
However, as a foundation class that is widely used in all kinds of applications, the author really has no idea whether this extra checking can save or hurt performance on average.
I'm gonna guess that the majority of String.equals()
are invoked in a Hashmap, after the hash codes has been known to be equal, so testing hash codes again is pointless.
If we consider comparing 2 random strings, even with a small char set like US ASCII, it is very likely that the hashes are different, and char-by-char comparison fails on 1st char. So it'll be a waste to check hashes.
Upvotes: 4
Reputation: 533530
AFAIK, The following check could be added to String. This check that if the hash codes are set and they are different, then the Strings cannot be equal.
if (hash != 0 && anotherString.hash != 0 && hash != anotherString.hash)
return false;
if (hash32 != 0 && anotherString.hash32 != 0 && hash32 != anotherString.hash32)
return false;
Upvotes: 2
Reputation: 21748
The string hash code is not available for free and automatically. In order to rely on hash code, it must be computed for both strings and only then can be compared. As collisions are possible, the second char-by-char comparison is required if the hash codes are equal.
While String appears as immutable for the usual programmer, it does have the private field to store its hashcode once it is computed. However this field is only computed when hashcode is first required. As you can see from the String source code here:
private int hash;
public int hashCode() {
int h = hash;
if (h == 0) {
...
hash = h;
}
return h;
}
Hence it is not obvious that it makes sense to compute the hashcode first. For your specific case (maybe same instances of really long strings are compared with each other a really many of times), it still may be: profile.
Upvotes: 0
Reputation: 1663
Hashcode could be a first-round check for inequality. However, it presents some tradeoffs.
String
hashcodes are lazily calculated, although they do use a "guard" value. If you're comparing strings with long lifetimes (ie, they're likely to have had the hashcode computed), this isn't a problem. Otherwise, you're stuck with either computing the hashcode (potentially expensive) or ignoring the check when the hashcode hasn't been computed yet. If you have a lot of short-lived strings, you'll be ignoring the check more often than you'll be using it.Upvotes: 41
Reputation: 10151
1) Calculating hashCode may not be faster than comparing the Strings directly.
2) if the hashCode is equal, the Strings may still not be equal
Upvotes: 7