Reputation: 85
What is the reason to make unique hashCode for hash-based collection to work faster?And also what is with not making hashCode mutable?
I read it here but didn't understand, so I read on some other resources and ended up with this question.
Thanks.
Upvotes: 0
Views: 168
Reputation: 89
You can define a customized class extending from HashMap. Then you override the methods (get, put, remove, containsKey, containsValue) by comparing keys and values only with equals method. Then you add some constructors. Overriding correctly the hashcode method is very hard.
I hope I have helped everybody who wants to use easily a hashmap.
Upvotes: 0
Reputation: 81115
The basic idea of hashing is that if one is looking in a collection for an object whose hash code differs from that of 99% of the objects in that collection, one only need examine the 1% of objects whose hash code matches. If the hashcode differs from that of 99.9% of the objects in the collection, one only need examine 0.1% of the objects. In many cases, even if a collection has a million objects, a typical object's hash code will only match a very tiny fraction of them (in many cases, less than a dozen). Thus, a single hash computation may eliminate the need for nearly a million comparisons.
Note that it's not necessary for hash values to be absolutely unique, but performance may be very bad if too many instances share the same hash code. Note that what's important for performance is not the total number of distinct hash values, but rather the extent to which they're "clumped". Searching for an object which is in a collection of a million things in which half the items all have one hash value and each remaining items each have a different value will require examining on average about 250,000 items. By contrast, if there were 100,000 different hash values, each returned by ten items, searching for an object would require examining about five.
Upvotes: 0
Reputation: 32507
As long, as you are working with collections that you are retrieving elements from by index (0,1,2... collection.size()-1) than you don't need hashcode. However, if we are talking about associative collections like maps, or simply asking collection does it contain some elements
than we are talkig about expensive operations.
Hashcode is like digest of provided object. It is robust and unique. Hashcode is generally used for binary comparisions. It is not that expensive to compare on binary level hashcode of every collection's member, as comparing every object by it's properties (more than 1 operation for sure). Hashcode needs to be like a fingerprint - one entity - one, and unmutable hashcode.
Upvotes: 0
Reputation: 132350
Hashcodes don't have to be unique, but they work better if distinct objects have distinct hashcodes.
A common use for hashcodes is for storing and looking objects in data structures like HashMap
. These collections store objects in "buckets" and the hashcode of the object being stored is used to determine which bucket it's stored in. This speeds up retrieval. When looking for an object, instead of having to look through all of the objects, the HashMap
uses the hashcode to determine which bucket to look in, and it looks only in that bucket.
You asked about mutability. I think what you're asking about is the requirement that an object stored in a HashMap
not be mutated while it's in the map, or preferably that the object be immutable. The reason is that, in general, mutating an object will change its hashcode. If an object were stored in a HashMap
, its hashcode would be used to determine which bucket it gets stored in. If that object is mutated, its hashcode would change. If the object were looked up at this point, a different hashcode would result. This might point HashMap
to the wrong bucket, and as a result the object might not be found even though it was previously stored in that HashMap
.
Upvotes: 4
Reputation: 4671
Why searching by hashing is faster?
lets say you have some unique objects as values and you have a String
as their keys. Each keys should be unique so that when the key is searched, you find the relevant object it holds as its value.
now lets say you have 1000 such key value pairs, you want to search for a particular key and retrieve its value. if you don't have hashing, you would then need to compare your key with all the entries in your table and look for the key.
But with hashing, you hash your key and put the corresponding object in a certain bucket on insertion. now when you want to search for a particular key, the key you want to search will be hashed and its hash value will be determined. And you can go to that hash bucket straight, and pick your object without having to search through the entire key entries.
Upvotes: 1
Reputation: 1575
hashCode
is a tricky method. It is supposed to provide a shorthand to equality (which is what maps and sets care about). If many objects in your map share the same hashcode, the map will have to check equals
frequently - which is generally much more expensive.
Check the javadoc for equals
- that method is very tricky to get right even for immutable objects, and using a mutable object as a map key is just asking for trouble (since the object is stored for its "old" hashcode)
Upvotes: 0
Reputation: 2322
It has to do with the way items are stored in a hash table. A hash table will use the element's hash code to store and retrieve it. It's somewhat complicated to fully explain here but you can learn about it by reading this section: http://www.brpreiss.com/books/opus5/html/page206.html#SECTION009100000000000000000
Upvotes: 1
Reputation: 86744
Hash codes are not required to be unique, they just have a very low likelihood of collisions.
As to hash codes being immutable, that is required only if an object is going to be used as a key in a HashMap
. The hash code tells the HashMap
where to do its initial probe into the bucket array. If a key's hash code were to change, then the map would no longer look in the correct bucket and would be unable to find the entry.
Upvotes: 2
Reputation: 16215
hashcode()
is basically a function that converts an object into a number. In the case of hash based collections, this number is used to help lookup the object. If this number changes, it means the hash based collection may be storing the object incorrectly, and can no longer retrieve it.
Uniqueness of hash values allows a more even distribution of objects within the internals of the collection, which improves the performance. If everything hashed to the same value (worst case), performance may degrade.
The wikipedia article on hash tables provides a good read that may help explain some of this as well.
Upvotes: 1