brain storm
brain storm

Reputation: 31252

how caching hashcode works in Java as suggested by Joshua Bloch in effective java?

I have the following piece of code from effective java by Joshua Bloch (Item 9, chapter 3, page 49)

If a class is immutable and the cost of computing the hash code is significant, you might consider caching the hash code in the object rather than recalculating it each time it is requested. If you believe that most objects of this type will be used as hash keys, then you should calculate the hash code when the instance is created. Otherwise, you might choose to lazily initialize it the first time hashCode is invoked (Item 71). It is not clear that our PhoneNumber class merits this treatment, but just to show you how it’s done:

    // Lazily initialized, cached hashCode
    private volatile int hashCode;  // (See Item 71)
    @Override public int hashCode() {
        int result = hashCode;
        if (result == 0) {
            result = 17;
            result = 31 * result + areaCode;
            result = 31 * result + prefix;
            result = 31 * result + lineNumber;
            hashCode = result;
        }
        return result;
    }

my question is how the caching (remembering the hashCode) works here. The very first time, hashCode() method is called, there is no hashCode to assign it to result. a brief explanation on how this caching works will be great. Thanks

Upvotes: 8

Views: 11039

Answers (4)

Stefan Steinegger
Stefan Steinegger

Reputation: 64628

A bit more detailled as already explained in other answers, because many details have a specifiy meaning, especially in a multi threaded environment.

// A field to hold the hash code per instance.
// It's volatile to make it available to all threads. (Otherwise, 
// it may happen that the hash code is calculated by one thread, but
// but still not available to other threads, because it remains in
// a cache of a single CPU core.)
private volatile int hashCode;  

@Override public int hashCode() {
    // Read the field into a local variable in order to not conflict 
    // with other threads while in this method.
    int result = hashCode;

    // Check the caches value. 0 means that the hash code has not been
    // calculated yet.
    if (result == 0) {

        // Calculate the hash code, keeping in in a local variable until 
        // it's finished, to avoid that other threads read an incomplete 
        // value.
        // It's possible that more than one thread calculate the hash code
        // simultanoulsy, which could be avoided when it was synchronized.
        // In most applications, it performs better when no synchronizing is
        // required when accessing an already calculated hash code, which is
        // expected to happen much more often compared to the unlikely case
        // that two (or more) threads calculate the hash code simulatanoulsy.

        // Initialized with 17 also makes sure that the value won't be 0
        // after calculation.
        result = 17;
        result = 31 * result + areaCode;
        result = 31 * result + prefix;
        result = 31 * result + lineNumber;

        // The hash code is written back to the volatile field to make it
        // available to future calls.
        hashCode = result;
    }

    // Return the hash code.
    return result;
}

Some final notes:

  • There is nothing wrong with this implementation even in non-multi-threaded environments.
  • hash codes shouldn't ever change in the life cycle of an object. So make sure that you don't access fields that may change later (this is always the case when calculating hash codes, but even more obvious when hash codes are cached).
  • Hash code calculation is still expected to be reasonably fast. It may be calculated multiple times. Otherwise it may make sense to synchronize.
  • Don't try similary things with other fields. Multi-threading without synchronization is very risky and usually not worth the risk.

Upvotes: 0

Dennis
Dennis

Reputation: 757

If you really wanted this to work right, you'd put another volatile variable boolean called isHashInvalid. Every setter involving values accessed in your hash function would set this variable. Then it becomes, (no need to test for '0' now):

private volatile int isHashInvalid=TRUE;
private volatile int hashCode; //Automatically zero but it doesn't matter

//You keep a member field on the class, which represents the cached hashCode value
@Override public int hashCode() {
    int result = hashCode;
    if (isHashInvalid) {
       result = 17;
       result = 31 * result + areaCode;
       result = 31 * result + prefix;
       result = 31 * result + lineNumber;
       //remember the value you computed in the hashCode member field
       hashCode = result;
       isHashInvalid=FALSE;
    }
    // when you return result, you've either just come from the body of the above
    // if statement, in which case you JUST calculated the value -- or -- you've
    // skipped the if statement in which case you've calculated it in a prior
    // invocation of hashCode, and you're returning the cached value.
    return result;
}

Upvotes: -1

Amir Afghani
Amir Afghani

Reputation: 38531

Simple. Read my embedded comments below...

private volatile int hashCode;
//You keep a member field on the class, which represents the cached hashCode value

   @Override public int hashCode() {
       int result = hashCode;
       //if result == 0, the hashCode has not been computed yet, so compute it
       if (result == 0) {
           result = 17;
           result = 31 * result + areaCode;
           result = 31 * result + prefix;
           result = 31 * result + lineNumber;
           //remember the value you computed in the hashCode member field
           hashCode = result;
       }
       // when you return result, you've either just come from the body of the above
       // if statement, in which case you JUST calculated the value -- or -- you've
       // skipped the if statement in which case you've calculated it in a prior
       // invocation of hashCode, and you're returning the cached value.
       return result;
   }

Upvotes: 12

rgettman
rgettman

Reputation: 178263

The hashCode variable in an instance variable, and it's not initialized explicitly, so Java intializes it to 0 (JLS Section 4.12.5). The comparison result == 0 is in effect a check to see if result has been assigned a presumably non-zero hash code. If it hasn't been assigned yet, then it performs the calculation, else it just returns the previously computed hash code.

Upvotes: 2

Related Questions