Mike
Mike

Reputation: 302

Is a hashCode() method that returns different values for every distinct object the most efficient approach?

I understand that returning the same value for each object is inefficient, but is it the most efficient approach to return distinct values for distinct instances?

If each object gets a different hashCode value then isn't this just like storing them in an ArrayList?

Upvotes: 3

Views: 2085

Answers (5)

pcalcao
pcalcao

Reputation: 15990

No, it's not actually.

Assuming your objects are going to be stored into a HashMap (or Set... doesn't matter, we'll use HashMap here for simplicity), you want your hashCode method to return a result in a way that distributes the objects as evenly as possible.

Hashcode should be unique for Objects that are not equal, although you can't guarantee this will always be true. On the other hand, if a.equals(b) is true, then a.hashCode() == b.hashCode(). This is known as the Object Contract.

Besides this, there are performance issues also. Each time two different objects have the same hashCode, they're mapped to the same position in the HashMap (aka, they collide). This means that the HashMap implementation has to handle this collision, which is much more complex than simply storing and retrieving an entry.

There are also plenty of algorithms that rely on the fact that values are distributed evenly across a Map, and the performance deteriorates rapidly when the number of collisions increase (some algorithms assume a perfect hash function, meaning that no collisions ever occur, no two different values get mapped to the same position on the Map).

Good examples of this are probabilistic algorithms and data-structures such as Bloom Filters (to use an example that appears to be in fashion these days).

Upvotes: 3

Matt
Matt

Reputation: 11805

The HashMap class's major data structure is this:

Entry[] table;

It's important to note that the Entry class (which is a static package protected class that implements Map.Entry) is actually a linked list style structure.

When you try to put an element, first the key's hashcode is computed and then transformed into a bucket number. The "bucket" is the index into the above array.

Once you find the bucket, a linear search is done inside of that bucket for the exact key (if you don't believe me, look at the HashMap code). If it is found, the value is replaced. If not, the key/value pair is appended to the end of that bucket.

For this reason, hashcode() values need not be unique, however, the more unique and evenly distributed they are, the better your odds are to have the value evenly distributed among the buckets. If your hashcode() method return the same value for all instances, they'd all end up in the same bucket, hence rendering your get() method to be one long linear search, yielding O(N)

The more distributed the values are, the smaller the buckets, and thus the smaller the linear search component would be. Unique values would yield constant time lookup O(1).

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533820

You want hashCode() as varied as possible to avoid collisions. If there are no collisions, each key or element will be stored in the underlying array on its own. (A bit like an ArrayList)

The problem is that even if the hashCode() are different you can still get collisions. This happens because you don't have a bucket for every possible hashCode, and this value has to be reduced to a smaller range. e.g. you have 16 buckets, the range is 0 to 15. How it does this is complicated, but I am sure you can see that even if all the hashCodes are different, they can still result in a collision (though its unlikely)

It is a concern for denial of service attacks. Normally strings have a low collision rate, however you can deliberately construct strings which have the same hashcode. This question gives a list of Strings with a hashCode of 0 Why doesn't String's hashCode() cache 0?

Upvotes: 1

René Jensen
René Jensen

Reputation: 451

The hashCode() method isn't suited for placing objects in an ArrayList. Although it does return the same value for the same object every time, two objects could quite possibly have the same hashcode.

Therefore the hashCode method is used on the key Object when storing items in for example a HashMap.

Upvotes: 0

Marko Topolnik
Marko Topolnik

Reputation: 200266

hashCode must be consistent with equals, that's number one priority. If no two objects are equal, then this would be desirable. Bear in mind that if your object has any more than 32 bits of state, it is theoretically impossible to provide a perfectly spread hashcode.

Upvotes: 4

Related Questions