gooDiVer
gooDiVer

Reputation: 43

Why is there no collision in a HashMap when different keys have the same hash code

I'm trying to create a collision intentionally.

 fun main(args: Array<String>) {
   val india = Country("India1", 1000)
   val india2 = Country("India2", 1000)

   val countryCapitalMap: HashMap<Country, String> = hashMapOf()
   countryCapitalMap.put(india, "Delhi1")
   countryCapitalMap.put(india2, "Delhi2")
 }


class Country(var name: String, var population: Long) {

  override fun hashCode(): Int {
      return if (name.length % 2 == 0) 31 else 95
  }

  override fun equals(obj: Any?): Boolean {
      val other = obj as Country?
      return if (name.equals(other!!.name, ignoreCase = true)) true else false
  }
}

So, I have india and india2 objects. I've overridden equals() and hashCode() methods for Country so that:

According to Collision resolution in Java HashMap and part of the article "Lets put this Country objects in hashmap", if key1 has result of hash(key.hashCode()) equal to the same operation on the key2 then there should be collision.

So, I put breakpoint to see content of countryCapitalMap and see that its size is 2. I.e. it contains two different entries and there is no linkedList. Hence, there is no collision.

My questions are:

Why countryCapitalMap has a size of 2? Why is there no collision?

Why doesn't HashMap creates a LinkedList with two entries with keys that is not equal but have the same hashCode?

Upvotes: 3

Views: 2017

Answers (1)

Alexander Ivanchenko
Alexander Ivanchenko

Reputation: 29058

You are confusing a collision - case when hashes of keys are the same (or to be precise, when hashes are withing the range corresponding to the same bucket of HashMap), with duplicated keys (i.e. keys that are identical according to the equals() method). These are two completely different situations.

Under the hood, HashMap maintains an array of buckets. Each bucket corresponds to a range of hash values.

If hashes of the keys collide (but keys are not equal) entries (Nodes to be precise) will be mapped to the same bucket and form a linked list, which after a certain threshold will turn into a tree.

Conversely, an attempt to add a new entry with a key that is already present in a map (i.e. duplicated key according to the equals() method) will result in updating the value of the existing entry.

Because as you've already observed, india.equals(india2) will return false, HashMap will contain two entries.

And since the hashes of india and india2 are identical, you've succeeded in your intention to create collision. Both entries will be added, both will end up in the same bucket (i.e. their nodes will form a linked list).

If you wonder how exactly this linked list looks like, take a look at the source code of the HashMap class. And you'll find that there's a field Node<K,V>[] table - an array of buckets, and there's an inner class Node:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

And each node holds a reference next pointing to the next node mapped to the same bucket (if any).

Sidenotes :

  • The hash-function that you've provided in the example is a bad one. Sure, that's clear that you did that on purpose (to ensure that two keys would collide). But it's important to point out that the proper implementation of the equals/hashCode contract is crucial for every object that is intended to be used with Collections framework. And for hash-based collections like HashMap and HashSet the implementation of hashCode() is significant to perform well. If hash-function generates a lot of collisions, as a consequence many entries could appear in the same buckets meanwhile a large number of buckets could remain unoccupied. And depending on a load factor (ratio between the occupied and total number of buckets) your collection might never get resized which will lead to degradation of performance.
  • Another thing related to hashes that worth to mention is that the hashCode() of the key will be invoked only once, while a new node gets created. Take a look at the code above, and you'll notice that the field hash is marked as final. That's done on purpose, because computing hash some objects can be costful. Which means that if the state of this key changes a HashMap will be unaware of that change and will be unable to recognize the same key, i.e. if the previous and new hash will differ equals() method would not be invoked and HashMap will allow the same key to appear twice. That leads to another rule of thumb: an object that is used as a key should be immutable.

Upvotes: 17

Related Questions