Reputation: 2135

Why does the get method of HashMap have a FOR loop?

I am looking at the source code for HashMap in Java 7, and I see that the put method will check if any entry is already present and if it is present then it will replace the old value with the new value.

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

So, basically it means that there would always be only one entry for the given key, I have seen this by debugging as well, but if I am wrong then please correct me.

Now, since there is only one entry for a given key, why does the get method have a FOR loop, since it could have simply returned the value directly?

    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }

I feel the above loop is unnecessary. Please help me understand if I am wrong.

Upvotes: 45

Answers (7)

hagrawal7777

Reputation: 14668

I think @Eran has already answered your query well and @Prashant has also made a good attempt along with other people who have answered, so let me explain it using an example so that concept becomes very clear.

Concepts

Basically what @Eran is trying to say that in a given bucket (basically at given index of the array) it is possible that there is more than one entry (nothing but Entry object) and this is possible when 2 or more keys give different hash but give the same index/bucket location.

Now, in order to put the entry in the hashmap, this is what happens at a high level (read carefully because I have gone the extra mile to explain some good things which are otherwise not part of your question):

Get the hash: what happens here is that first hash is calculated for a given key (notice that this is not hashCode, a hash is calculated using the hashCode and it is done as-as to mitigate the risk of poorly written hash function).
Get the index: This is basically the index of the array or in other words bucket. Now, why this index is calculated instead of directly using the hash as the index is because to mitigate the risk that hash could more than the size of the hashmap, so this index calculation step ensures that index will always be less than the size of the hashmap.

And when a situation occurs when 2 keys give different hash but the same index, then both those will go in the same bucket, and that is the reason that FOR loop is important.

Example

Below is a simple example I have created to demonstrate the concept to you:

public class Person {
    private int id;

    Person(int _id){
        id = _id;
    }

    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }

    @Override
    public int hashCode() {
        return id;
    }
}

Test class:

import java.util.Map;

public class HashMapHashingTest {
    public static void main(String[] args) {
        Person p1 = new Person(129);
        Person p2 = new Person(133);

        Map<Person, String> hashMap = new MyHashMap<>(2);
        hashMap.put(p1, "p1");
        hashMap.put(p2, "p2");
        System.out.println(hashMap);
    }
}

Debug screenshot (please click and zoom because it is looking small):

Notice, that in above example, both Person object gives different hash value (136 and 140 respectively) but gives the same index of 0, so both objects go in the same bucket. In the screenshot, you can see that both objects are at index 0 and there you have a next also populated which basically points to the second object.

Update: Another easiest way to see that more than one key is going into the same bucket is by creating a class and overriding the hashCode method to always return the same int value, now what would happen is that all the objects of that class would give the same index/bucket location but since you have not overridden the equals method so they would not be considered same and hence will form a list at that index/bucket location.

Another twist in this would suppose you override the equals method as well and compare all objects equal then only one object will be present at the index/bucket location because all objects are equal.

Upvotes: 4

vkrishna17

Reputation: 936

I would like to put it in simple words. the put method have a FOR loop to iterate over the list of keys which falls under the same bucket of hashCode.

What happens when you do put the key-value pair into the hashmap:

So for every key you pass to the HashMap, it will calculate the hashCode for it.
So many keys can fall under the same hashCode bucket. Now HashMap will check if the same key is already present or not in the same bucket.
In Java 7, HashMap maintains all the keys of the same bucket in a list. So before inserting the key it will traverse through the list to check if the same key is present or not. That's why there is a FOR loop.

So in average case its time complexity: O(1) and in worst case its time complexity is O(N).

Upvotes: 0

Loduwijk

Reputation: 1977

While the other answers explain what is going on, OP's comments on those answers leads me to think a different angle of explanation is required.

Simplified example

Let's say you are going to toss 10 strings into a hash map: "A", "B", "C", "Hi", "Bye", "Yo", "Yo-yo", "Z", "1", "2"

You are using HashMap as your hash map instead of making your own hash map (good choice). Some of the stuff below will not use HashMap implementation directly but will approach it from a more theoretical and abstract point of view.

HashMap does not magically know that you are going to add 10 strings to it, nor does it know what strings you will be putting into it later. It has to provide places to put whatever you might give to it... for all it knows you are going to put 100,000 strings in it - perhaps every word in the dictionary.

Let's say that, because of the constructor argument you chose when making your new HashMap(n) that your hash map has 20 buckets. We'll call them bucket[0] through bucket[19].

map.put("A", value); Let's say that the hash value for "A" is 5. The hash map can now do bucket[5] = new Entry("A", value);
map.put("B", value); Assume hash("B") = 3. So, bucket[3] = new Entry("B", value);
map.put("C"), value); - hash("C") = 19 - bucket[19] = new Entry("C", value);
map.put("Hi", value); Now here's where it gets interesting. Let's say that your hash function is such that hash("Hi") = 3. So now hash map wants to do bucket[3] = new Entry("Hi", value); We have a problem! bucket[3] is where we put the key "B", and "Hi" is definitely a different key than "B"... but they have the same hash value. We have a collision!

Because of this possibility, the HashMap is not actually implemented this way. A hash map needs to have buckets that can hold more than 1 entry in them. NOTE: I did not say more than 1 entry with the same key, as we cannot have that, but it needs to have buckets that can hold more than 1 entry of different keys. We need a bucket that can hold both "B" and "Hi".

So let's not do bucket[n] = new Entry(key, value);, but instead let's have bucket be of type Bucket[] instead of Entry[]. So now we do bucket[n].add( new Entry(key, value) );

So let's change to...

bucket[3].add("B", value);

and

bucket[3].add("Hi", value);

As you can see, we now have the entries for "B" and "Hi" in the same bucket. Now, when we want to get them back out, we need to loop through everything in the bucket, for example, with a for loop.

So the looping is present because of the collisions. Not collisions of key, but collisions of hash(key).

Why do we use such a crazy data structure?

You might be asking at this point, "Wait, WHAT!?! Why would we do such a weird thing like that??? Why are we using such a contrived and convoluted data structure???" The answer to that question would be...

A hash map works like this because of the properties that such a peculiar setup provides to us due to the way the math works out. If you use a good hash function which minimizes conflicts, and if you size your HashMap to have more buckets than the number of entries that you guess will be in it, then you have an optimized hash map which will be the fastest data structure for inserts and queries of complex data.

Your HashMap may be too small

Since you say you are often seeing this for-loop being iterated over with multiple elements in your debugging, that means that your HashMap might be too small. If you have a reasonable guess as to how many things you might put into it, try to set the size to be larger than that. Notice in my example above that I was inserting 10 strings but had a hash map with 20 buckets. With a good hash function, this will yield very few collisions.

Note:

Note: the above example is a simplification of the matter and does take some shortcuts for brevity. A full explanation is even slightly more complicated, but everything you need to know to answer the question as asked is here.

Upvotes: 2

Nikolay Lebedev

Reputation: 344

Hash tables has buckets because hashes of objects do not have to be unique. If hashes of objects are equal, means, objects, probably, are equal. If hashes of objects are different, then objects are exactly different. Therefore, objects with the same hashes are grouped into buckets. The for loop is used to iterate objects contained in such a bucket.

In fact, this means that the algorithmic complexity of finding an object in such a hash table is not constant (although very close to it), but something between logarithmic and linear.

Upvotes: 1

Prashant Patil

Reputation: 371

If you see the internal working of get method of HashMap.

public V get(Object key)  {
        if (key == null)
           return getForNullKey();
         int hash = hash(key.hashCode());
         for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next) 
         {
             Object k;
             if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                 return e.value;
         }
             return null;
}

First, it gets the hash code of the key object, which is passed, and finds the bucket location.
If the correct bucket is found, it returns the value (e.value)
If no match is found, it returns null.

Some times there may be chances of Hashcode collision and for solving this collision Hashmap uses equals() and then store that element into LinkedList in same bucket.

Lets take example:

Fetch the data for key vaibahv: map.get(new Key("vaibhav"));

Steps:

Calculate hash code of Key {“vaibhav”}.It will be generated as 118.
Calculate index by using index method it will be 6.
Go to index 6 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.
In our case it is not found as first element and next of node object is not null.
If next of node is null then return null.
If next of node is not null traverse to the second element and repeat the process 3 until key is not found or next is not null.

For this retrieval process for loop will be used. For more reference you can refer this

Upvotes: 18

Eugene

Reputation: 120858

For the record, in java-8, this is present also (sort of, since there are TreeNodes also):

if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }

Basically (for the case when the bin is not a Tree), iterate the entire bin, until you find the entry we are looking for.

Looking at this implementation you might understand why providing a good hash is good - so that not all entries end up in the same bucket, thus a bigger time to search for it.

Upvotes: 5

Eran

Reputation: 393846

table[indexFor(hash, table.length)] is the bucket of the HashMap that may contain the key we are looking for (if it is present in the Map).

However, each bucket may contain multiple entries (either different keys having the same hashCode(), or different keys with different hashCode() that still got mapped to the same bucket), so you must iterate over these entries until you find the key you are looking for.

Since the expected number of entries in each bucket should be very small, this loop is still executed in expected O(1) time.

Upvotes: 62