Reputation: 1731

Why does changing the hashcode of an object used as a key in a HashMap make a lookup return null?

Consider the following scenario:

Object o1 = new Object();
Object o2 = new Object();

HashMap<Object, Object> map = new HashMap<Object, Object>();
map.put(o1, o2);

boolean test1 = map.get(o1) == o2; // This evaluates to true

// Now lets say we alter the state of o1:
o1.setSomeInternalState(Object newState);

boolean test2 = map.get(o1) == o2; // This evaluates to false, because now map.get(o1) returns null

Assume that the class for o1 has overridden equals() and hashCode().

I've encountered this issue during debugging because I had explicitly overridden equals and hashCode on one particular object I'm using in some business logic. I can fully appreciate why the hashcode of the object changes when I alter its state, but why should the map.get(o1) return null because of it? There is only one object, so shouldn't the key's hashcode match?

Upvotes: 4

Answers (7)

Bojan

Reputation: 23

As someone stated, it not advisable to modify keys in a HashMap (i.e. they should be immutable), but there are rare occasions where this may be absolutely required (key objects need to retain the same reference).

Java unfortunately does not provide a "rehash" method.

In that case there is a simple solution - replace the map, i.e. : map = new HashMap<>(map);

However (besides the fact that it is not optimal) if map also has to retain it's original reference, the following code may be helpful to someone:

private static <K, V> void rehashHashMapChangedKeys(HashMap<K, V> hashMap) {
    Iterator<Map.Entry<K, V>> itr = hashMap.entrySet().iterator();

    Set<Map.Entry<K, V>> toBeReAdded = new HashSet<>();
    while (itr.hasNext()) {
        Map.Entry<K, V> entry = itr.next();
        if (!hashMap.containsKey(entry.getKey())) {
            itr.remove();
            toBeReAdded.add(entry);
        }
    }

    for (Map.Entry<K, V> entry : toBeReAdded) {
        hashMap.put(entry.getKey(), entry.getValue());
    }
}

Upvotes: 0

supercat

Reputation: 81179

While the contract of hashCode() is frequently described in terms of the implementer, it's sometimes more useful to think of it from the standpoint of the caller: code which knows that two objects have returned different values for hashCode() is entitled to assume that they cannot possibly be equal. While many descriptions of hashing talk about bucket indices, the issue of hashing goes beyond that.

Basically, the purpose of hashCode() is to make it possible to quickly identify large numbers of things that can't possibly equal an item. While it's common to subdivide things into buckets whose hash codes meet various criteria (a bucket of things whose hash code meets some criterion can't contain anything whose hash code does not meet that criterion), that's not the only use for hash codes. If a collection class records the hash code of an items when they are added, it may check whether two instances contain the same sequence of items by first checking whether they contain the same sequence of hash values. If the hash values all match, then it will be necessary to examine the items individually, but if e.g. the hash values of each collections' fiftieth item differs, there's no reason to inspect the first 49 items in detail.

Thinking of hash codes in terms of the bold face statement above and its usage and contractual implications will be much clearer than would thinking of it in terms of buckets.

Upvotes: 1

August

Reputation: 12558

The HashMap class maps keys to values by running the hashCode of the key through a hash function. The hash function is used to create an index into an array of buckets. For example, a very primitive hash function would be hashCode % tableSize. Changing the key's hashCode would alter the index created by the hash function, meaning there is nothing to be found in that bucket.

Let's run an example, assuming that the initial hashCode is 15 and the table size is 4:

                         ┌----------------------┐
15 (initial hashCode) -> | hashCode % tableSize | -> index 3
                         |    (hash function)   |
                         └----------------------┘

So let's insert the value at index 3:

  ┌------┐
0 | null |
  |------|
1 | null |
  |------|
2 | null |
  |------|
3 | key! | <- insert
  └------┘

Now let's modifiy the key's hashCode so that it is now 13:

                          ┌----------------------┐
13 (modified hashCode) -> | hashCode % tableSize | -> index 1
                          |    (hash function)   |
                          └----------------------┘

What's at index 1? Nothing, null.

A lot of things have been simplified here. In a real hash table implementation, the hash function is much more complex to create a more even distribution. Also, the buckets are linked-lists so collisions can be handled.

Upvotes: 10

Brian Agnew

Reputation: 272307

The hashcode is used to store the object, and consequently to look it up. If you change the hashcode once you've stored the object, the chances are that your lookup will fail.

Implementation details may differ, but fundamentally a hash-based collection consists of a set of buckets of objects. The hashcode dictates which bucket within your hash-based collection the object is stored into (the equals() method then identifies the object within that bucket - if your collection is correctly scaled then there would only be one such object). When your hashcode changes, your lookup will most likely find a different bucket of items within the collection, and so your object appears to be missing.

It's for this reason that it's recommended to create a hashcode from the immutable fields of your object.

Note that you could change the hashcode and possibly still find your object. Your hashcode is an integer (a 32-bit number) and that maps to a much smaller set of buckets, usually (e.g. via some sort of calculation such as hashcode % 16, say). As such your hashcode could change, but the result of hashcode % 16 could give the same result, and hence the same bucket. This is implementation-dependent, obviously.

Upvotes: 1

user2717954

Reputation: 1902

map.get searches for an object whose hashcode is same as the object being looked for. since this 2 objects has different hashcode it will return null thinking this object is not in the map

final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

if no object has hash s.t. e.hash == hash then null will be returned

Upvotes: 0

SMA

Reputation: 37033

That's how hashCode method you must have defined. So let's say we have employee class with two field:

class Employee {
    int id; 
    String name;
    public int hashCode() { 
        return name.hashCode() ^ id;
    }
}

Now if you just have set name, you might end up getting name's hashcode (and default id as 0 which would return hashCode of name) while if i later change id say even to 1 then it might create another hashCode xoring name's hashcode with 1.

Upvotes: 0

Hovercraft Full Of Eels

Reputation: 285405

You've stored it with one hashCode and are looking it up with another changed hashCode, so your program's behavior is as expected. This is why the contract for HashMap specificaly states that you should not use keys whose hashCodes can change. I'd follow this recommendation if I were you.

Upvotes: 1

Why does changing the hashcode of an object used as a key in a HashMap make a lookup return null?

Answers (7)

Related Questions