Reputation: 2135
I am looking at the source code for HashMap
in Java 7, and I see that the put
method will check if any entry is already present and if it is present then it will replace the old value with the new value.
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
So, basically it means that there would always be only one entry for the given key, I have seen this by debugging as well, but if I am wrong then please correct me.
Now, since there is only one entry for a given key, why does the get
method have a FOR loop, since it could have simply returned the value directly?
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
I feel the above loop is unnecessary. Please help me understand if I am wrong.
Upvotes: 45
Views: 6348
Reputation: 14668
I think @Eran has already answered your query well and @Prashant has also made a good attempt along with other people who have answered, so let me explain it using an example so that concept becomes very clear.
Basically what @Eran is trying to say that in a given bucket (basically at given index of the array) it is possible that there is more than one entry (nothing but Entry
object) and this is possible when 2 or more keys give different hash but give the same index/bucket location.
Now, in order to put the entry in the hashmap, this is what happens at a high level (read carefully because I have gone the extra mile to explain some good things which are otherwise not part of your question):
hashCode
, a hash is calculated using the hashCode
and it is done as-as to mitigate the risk of poorly written hash function).And when a situation occurs when 2 keys give different hash but the same index, then both those will go in the same bucket, and that is the reason that FOR loop is important.
Below is a simple example I have created to demonstrate the concept to you:
public class Person {
private int id;
Person(int _id){
id = _id;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
@Override
public int hashCode() {
return id;
}
}
Test class:
import java.util.Map;
public class HashMapHashingTest {
public static void main(String[] args) {
Person p1 = new Person(129);
Person p2 = new Person(133);
Map<Person, String> hashMap = new MyHashMap<>(2);
hashMap.put(p1, "p1");
hashMap.put(p2, "p2");
System.out.println(hashMap);
}
}
Debug screenshot (please click and zoom because it is looking small):
Notice, that in above example, both Person
object gives different hash value (136 and 140 respectively) but gives the same index of 0, so both objects go in the same bucket. In the screenshot, you can see that both objects are at index 0
and there you have a next
also populated which basically points to the second object.
hashCode
method to always return the same int value, now what would happen is that all the objects of that class would give the same index/bucket location but since you have not overridden the equals
method so they would not be considered same and hence will form a list at that index/bucket location.
Another twist in this would suppose you override the equals
method as well and compare all objects equal then only one object will be present at the index/bucket location because all objects are equal.
Upvotes: 4
Reputation: 936
I would like to put it in simple words. the put
method have a FOR loop to iterate over the list of keys which falls under the same bucket of hashCode.
What happens when you do put
the key-value
pair into the hashmap:
key
you pass to the HashMap
, it will calculate the hashCode for it.keys
can fall under the same hashCode
bucket. Now HashMap will check if the same key
is already present or not in the same bucket.So in average case its time complexity: O(1)
and in worst case its time complexity is O(N)
.
Upvotes: 0
Reputation: 1977
While the other answers explain what is going on, OP's comments on those answers leads me to think a different angle of explanation is required.
Let's say you are going to toss 10 strings into a hash map: "A", "B", "C", "Hi", "Bye", "Yo", "Yo-yo", "Z", "1", "2"
You are using HashMap
as your hash map instead of making your own hash map (good choice). Some of the stuff below will not use HashMap
implementation directly but will approach it from a more theoretical and abstract point of view.
HashMap
does not magically know that you are going to add 10 strings to it, nor does it know what strings you will be putting into it later. It has to provide places to put whatever you might give to it... for all it knows you are going to put 100,000 strings in it - perhaps every word in the dictionary.
Let's say that, because of the constructor argument you chose when making your new HashMap(n)
that your hash map has 20 buckets. We'll call them bucket[0]
through bucket[19]
.
map.put("A", value);
Let's say that the hash value for "A" is 5. The hash map can now do bucket[5] = new Entry("A", value);
map.put("B", value);
Assume hash("B") = 3. So, bucket[3] = new Entry("B", value);
map.put("C"), value);
- hash("C") = 19 - bucket[19] = new Entry("C", value);
map.put("Hi", value);
Now here's where it gets interesting. Let's say that your hash function is such that hash("Hi") = 3. So now hash map wants to do bucket[3] = new Entry("Hi", value);
We have a problem! bucket[3]
is where we put the key "B", and "Hi" is definitely a different key than "B"... but they have the same hash value. We have a collision!
Because of this possibility, the HashMap
is not actually implemented this way. A hash map needs to have buckets that can hold more than 1 entry in them. NOTE: I did not say more than 1 entry with the same key, as we cannot have that, but it needs to have buckets that can hold more than 1 entry of different keys. We need a bucket that can hold both "B" and "Hi".
So let's not do bucket[n] = new Entry(key, value);
, but instead let's have bucket
be of type Bucket[]
instead of Entry[]
. So now we do bucket[n].add( new Entry(key, value) );
So let's change to...
bucket[3].add("B", value);
and
bucket[3].add("Hi", value);
As you can see, we now have the entries for "B" and "Hi" in the same bucket. Now, when we want to get them back out, we need to loop through everything in the bucket, for example, with a for loop.
So the looping is present because of the collisions. Not collisions of key
, but collisions of hash(key)
.
You might be asking at this point, "Wait, WHAT!?! Why would we do such a weird thing like that??? Why are we using such a contrived and convoluted data structure???" The answer to that question would be...
A hash map works like this because of the properties that such a peculiar setup provides to us due to the way the math works out. If you use a good hash function which minimizes conflicts, and if you size your HashMap
to have more buckets than the number of entries that you guess will be in it, then you have an optimized hash map which will be the fastest data structure for inserts and queries of complex data.
Since you say you are often seeing this for-loop being iterated over with multiple elements in your debugging, that means that your HashMap
might be too small. If you have a reasonable guess as to how many things you might put into it, try to set the size to be larger than that. Notice in my example above that I was inserting 10 strings but had a hash map with 20 buckets. With a good hash function, this will yield very few collisions.
Note: the above example is a simplification of the matter and does take some shortcuts for brevity. A full explanation is even slightly more complicated, but everything you need to know to answer the question as asked is here.
Upvotes: 2
Reputation: 344
Hash tables has buckets because hashes of objects do not have to be unique. If hashes of objects are equal, means, objects, probably, are equal. If hashes of objects are different, then objects are exactly different. Therefore, objects with the same hashes are grouped into buckets. The for loop is used to iterate objects contained in such a bucket.
In fact, this means that the algorithmic complexity of finding an object in such a hash table is not constant (although very close to it), but something between logarithmic and linear.
Upvotes: 1
Reputation: 371
If you see the internal working of get method of HashMap.
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next)
{
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Some times there may be chances of Hashcode collision and for solving this collision Hashmap uses equals() and then store that element into LinkedList in same bucket.
Fetch the data for key vaibahv: map.get(new Key("vaibhav"));
Steps:
Calculate hash code of Key {“vaibhav”}.It will be generated as 118.
Calculate index by using index method it will be 6.
Go to index 6 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.
In our case it is not found as first element and next of node object is not null.
If next of node is null then return null.
If next of node is not null traverse to the second element and repeat the process 3 until key is not found or next is not null.
For this retrieval process for loop will be used. For more reference you can refer this
Upvotes: 18
Reputation: 120858
For the record, in java-8, this is present also (sort of, since there are TreeNode
s also):
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
Basically (for the case when the bin is not a Tree
), iterate the entire bin, until you find the entry we are looking for.
Looking at this implementation you might understand why providing a good hash is good - so that not all entries end up in the same bucket, thus a bigger time to search for it.
Upvotes: 5
Reputation: 393846
table[indexFor(hash, table.length)]
is the bucket of the HashMap
that may contain the key we are looking for (if it is present in the Map
).
However, each bucket may contain multiple entries (either different keys having the same hashCode()
, or different keys with different hashCode()
that still got mapped to the same bucket), so you must iterate over these entries until you find the key you are looking for.
Since the expected number of entries in each bucket should be very small, this loop is still executed in expected O(1)
time.
Upvotes: 62