Explaination for IntelliJ default hashCode() implementation

Question

I took a look at the IntelliJ default hashCode() implementation and was wondering, why they implemented it the way they did. I'm quite new to the hash concept and found some contradictory statements, that need clarification:

public int hashCode(){
  // creationDate is of type Date
  int result = this.creationDate != null ? this.creationDate.hashCode() : 0;
  // id is of type Long (wrapper class)
  result = 31 * result + (this.id != null ? this.id.hashCode() : 0);
  // code is of type String
  result = 31 * result + (this.code != null ? this.code.hashCode() : 0);
  // revision is of type int
  result = 31 * result + this.revision;
  return result;
}

Imo, the best source about this topic seemed to be this Java world article because I found their arguments most convincing. So I was wondering:

Among other arguments, above source states that multiplication is one of the slower operations. So, wouldn't it be better to skip the multiplication with a prime number whenever I call the hashCode() method of a reference type? Because most of the time this already includes such a multiplication.
Java world states that bitwise XOR ^ also improves the computation due to not mentioned reasons : ( What exactly might be an advantage in comparison to regular addition?
Wouldn't it be better to return different values when the respective class field is null? It would make the result more distinguishable, wouldn't it? Are there any huge disadvantages to use non-zero values?

Their example code looks more appealing to my eye, tbh:

  public boolean hashCode() {
    return
     (name  == null ? 17 : name.hashCode()) ^ 
     (birth == null ? 31 : name.hashCode());
  }

But I'm not sure if that's objectively true. I'm also a little bit suspicious of IntelliJ because their default code for equals(Object) compares by instanceof instead of comparing the instance classes directly. And I agree with that Java world article that this doesn't seem to fulfill the contract correctly.

Jiri Tousek · Accepted Answer

As for hashCode(), I would consider it more important to minimize collisions (two different objects having same hashCode()) than the speed of the hashCode() computation. Yes, the hashCode() should be fast (constant-time if possible), but for huge data structures using hashCode() (maps, sets etc.) the collisions are more important factor.

If your hashCode() function performs in constant time (independent on data and input size) and produces a good hashing function (few collisions), asymptotically the operations (get, contains, put) on map will perform in constant time.

If your hashCode() function produces a lot of collisions, the performance will suffer. In extreme case, you can always return 0 from hashCode() - the function itself will be super-fast, but the map operations will perform in linear time (i.e. growing with map size).

Multiplying the hashCode() before adding another field's sub-hashCode should usually provide for less collisions - this is a heuristic based on that often the fields contain similar data / small numbers.

Consider an example of class Person:

class Person {
    int age;
    int heightCm;
    int weightKg;
}

If you just added the numbers together to compute the hashCode, the result would be somewhere between 60 and 500 for all persons. If you multiply it the way Idea does, you will get hashCodes between 2000 and more than 100000 - much bigger space and therefore lower chance of collisions.

Using XOR is not a very good idea, for example if you have class Rectangle with fields height and width, all squares would have the same hashCode - 0.

As for equals() using instanceof vs. getClass().equals(), I've never seen a conclusive debate on this. Both have their advantages and disadvantages, and both ways can cause troubles if you're not careful:

If you use instanceof, any subclass that overrides your equals() will likely break the symmetry requirement
If you use getClass().equals(), this will not work well with some frameworks like Hibernate that produce their own subclasses of your classes to store their own technical information

Explaination for IntelliJ default hashCode() implementation

Answers (1)

Related Questions