Jack
Jack

Reputation: 6600

How to make an efficient hashCode?

I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method.

1) public int hashCode() { //terrible
     return 5; 
   }
2) public int hashCode() { //a bit less terrible
     return name.length; 
   }
3) public int hashCode() { //better
     final int prime = 31;
     int result = 1;
     result = prime * result + ((name == null) ? 0 : name.hashCode());
     return result;
   }

Upvotes: 5

Views: 6503

Answers (5)

Marco13
Marco13

Reputation: 54611

In addition to the valuable answers so far, I'd like to add some other methods to consider:


3a):

public int hashCode() {
     return Objects.hashCode(name);
}

Not many pros/cons in terms of performance, but a bit more concise.


4.) You should either provide more information about the class that you are talking about, or reconsider your design. But using a class as the key of a hash map when the only property of this class is a String, then you might also be able to just use the String directly. So option 4 is:

// Changing this...
Map<Key, Value> map;
map.put(key, value);
Value value = map.get(key);

// ... to this:
Map<String, Value> map;
map.put(key.getName(), value);
Value value = map.get(key.getName());

(And if this is not possible, because the "name" of a Key might change after it has been created, you're in bigger trouble anyhow - see the next point)


5.) Maybe you can precompute the hash code. In fact, this is also done in the java.lang.String class:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    ...

    /** Cache the hash code for the string */
    private int hash; // Default to 0

But of course, this only makes sense for immutable classes. You should be aware of the fact that using mutable classes as keys of a Map is "dangerous" and may lead to consistency errors, and should only be done when you're absolutely sure that the instances that are used as keys won't change.

So if you want to use your class as the keys, and maybe your class even has more fields than just a single one, then you could store the hash code as a field:

class Key 
{
    private final String name;
    ... // Other fields...

    private final int hashCode;

    Key(String name, ...)
    {
        this.name = name;
        ... // Other fields

        // Pre-compute and store the hash code:
        this.hashCode = computeHashCode();
    }


    private int computeHashCode()
    {
        int result = 31;
        result = 31 * result + Objects.hashCode(name);
        result = 31 * result + ... // Other fields
        return result;
    }
}

Upvotes: 3

apangin
apangin

Reputation: 98284

Yes, there are better alternatives.

xxHash or MurmurHash3 are general-purpose hashing algorithms that are both faster and better in quality.

Upvotes: 0

OldCurmudgeon
OldCurmudgeon

Reputation: 65793

There is no surefire way to guarantee that your hashcode function is optimal because it is measured by two different metrics.

  • Efficiency - How quick it is to calculate.
  • Collisions - What is the chance of collision.

Your:

  1. Maximises efficiency at the expense of collisions.
  2. Finds a spot somwhere in the middle - but still not good.
  3. Least efficient but best for avoiding collisions - still not necessarily best.

You have to find the balance yourself.

Sometimes it is obvious when there is a very efficient method that never collides (e.g. the ordinal of an enum).

Sometimes memoising the values is a good solution - this way even a very inefficient method can be mitigated because it is only ever calculated once. There is an obvious emeory cost to this which also must be balanced.

Sometimes the overall functionality of your code contributes to your choice. Say you want to put File objects in a HashMap. A number of options are clear:

  1. Use the hashcode of the file name.
  2. Use the hashcode of the file path.
  3. Use a crc of the contents of the file.
  4. Use the hashcode of the SHA1 digest of the contents of the file.

Why collisions are bad

One of the main uses of hashcode is when inserting objects into a HashMap. The algorithm requests a hash code from the object and uses that to decide which bucket to put the object in. If the hash collides with another object there will be another object in that bucket, in which case the bucket will have to grow which costs time. If all hashes are unique then the map will be one item per bucket and thus maximally efficient.

See the excellent WikiPedia article on Hash Table for a deeper discussion on how HashMap works.

Upvotes: 6

Marko Topolnik
Marko Topolnik

Reputation: 200138

I prioritised them based on their efficiency

Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the hashCode method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside HashMap, completely annulling the advantages of a hashtable.

Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.

Upvotes: 4

GhostCat
GhostCat

Reputation: 140407

My answer is going a different path - basically it is not answer, but a question: why do you worry about the performance of hashCode()?

Did you exhaustive profiling of your application and found that there is a performance problem originating from that one method on some of your objects?

If the answer to that question is "no" ... then - why do you think you need to worry about this one method? Why do you think that the default, generated by eclipse, probably used billions of times each day ... isn't good enough for you?

See here for explanations why it is in general a very bad idea to waste ones time with such questions.

Upvotes: 2

Related Questions