StevenWilkins
StevenWilkins

Reputation: 1323

Java hashcode: Object vs composite

I've stumbled on this method in some code whose only purpose is to create a String Key for a HashMap (EDIT: In my case all of X, Y and Z will be numeric, consider them co-ordinates if that makes it easier):

protected String createMappingKey(String x, String y, String z) {
        return x+":"+y+":"+z;
}

Something about this code is not sitting right and I think it would be better to be replaced with an object like so (note that this code has been generated by my IDE so you can change the implementation however you'd like):

public static class MyKey {
        String x,y,z;

        // Constructor(s), maybe getters and setters too here

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;

            MyKey myKey = (MyKey) o;

            if (x != null ? !x.equals(myKey.x) : myKey.x != null) return false;
            if (y != null ? !y.equals(myKey.y) : myKey.y != null) return false;
            if (z != null ? !z.equals(myKey.z) : myKey.z != null) return false;

            return true;
        }

        @Override
        public int hashCode() {
            int result = x != null ? x.hashCode() : 0;
            result = 31 * result + (y != null ? y.hashCode() : 0);
            result = 31 * result + (z != null ? z.hashCode() : 0);
            return result;
        }
    }

But this seems like an awful lot of code for not a lot of value. I'm pretty sure there will only be a negligible difference in the number of collisions between these two approaches.

Which of these approaches would you prefer and why? Is there a better approach that I'm missing?

If one of these approaches will have a significant number of collisions more than the other then that would also interest me and I'll open a separate question to deal with that.

Upvotes: 4

Views: 1665

Answers (9)

Costi Ciudatu
Costi Ciudatu

Reputation: 38255

I have the feeling that the missing part of your code sample would be more helpful than what you posted. Why exactly do you need a String key ? Those x, y, z numbers, if they're related, would be better placed in some meaningful bean which you can use as map key (and which could also offer some extra functionality).

If X, Y, Z are really coordinates, than it means that you simply need a Point3D as key. Make that immutable and just use it, why bother creating string patterns only for map keys ? If you need that {x}:{y}:{z} representation for something, you can return that from Point3D.toString(). And you may also reconstruct a point from the string representation with some static valueOf(String) factory method.

Upvotes: 0

ILMTitan
ILMTitan

Reputation: 11047

The approach I tend to use when I want to create a key, but don't feel a full key class is warranted, is to create a List using Arrays.asList.

protected List<Object> createMappingKey(String x, String y, String z) {
    return Arrays.<Object>asList(x, y, z);
}

The danger of a String is that equals could collide if your elements use the character you also used as a separator. A list ensures that such a collision can not happen. It also has the benefit of working with any object with a correct equals/hashCode implementation, not just Strings and objects with equals-compatable toString implementations.

If you do want to create a key class, you can use Appache Commons EqualsBuilder and HashCodeBuilder to significantly shorted your hashCode and equals classes.

Upvotes: 3

mhaller
mhaller

Reputation: 14222

The createMappingKey() method creates the String only once. Hashcode, which is used for Maps, is calculated only once for the String, since it's immutable. But the method creates some StringBuilders to perform the concatenation. The string-representation itself is created only to later represent a hashcode in the HashMap. So it's a throw-away-product without any value.

You can make it have value and serve other purposes as well. Your MyKey class can be immutable too, so you can precalculate your own hashCode up front and cache it. That has the potential to improve the performance when the HashMap is accessed or modified very often, because the calls to equals()/hashCode() are faster.

A very own class for the key sounds like better object orientation, but of course introduces additional source code and clutter. But it probably expresses better what the key really is: Is it just a concatenation of Strings? I doubt it. It might as well be an identifier for a customer or something like that, so why not call it like that?

public class CustomerIdentifier 
{

 public CustomerIdentifier(String tenantId,
                           String customerName,
                           String location) 
 {
  // Validate tenantId, customerName, location
  // e.g. non-null, minimum length, uppercase/lowercase
  // pre-calculate hashCode
  // and throw away values if we don't need them 
 }

 // equals() and hashCode() go here
 public long hashCode() { return precalc; };
}



map.put(new CustomerIdentifier("a","b","c), customer);

This approach let's you define in your project how a Customer object is identified. It encapsulates the key-generation - you're independent of whether you have to use a : as delimiter or something else. Other clients (e.g. if the HashMap is accessed by other code not under your control) may have it easier, because they just use the CustomerIdentifier instead of duplicating the key-generation code (it's not much, but the problem may often come later, when someone changes the delimiter and the other code doesn't...)

Also, this approach let's you put the validation of the x/y/z values into a class. Can they be null? Can they be colons? How do you distinguish them?

x=':' y=''  z=''    is       ::::
x=''  y=':' z=''    is       ::::
x=''  y=''  z=':'   is       ::::

The downside is that it's probably harder to debug. In such cases, you would want to implement toString() in that class as well, so you can faster browse through the hashmap in the debugger view and find the key you're looking for.

The second downside is that you may often have to create new key objects for code which accesses the HashMap, but only has the x/y/z values at hand.

Upvotes: 0

villintehaspam
villintehaspam

Reputation: 8734

The key being generated by createMappingKey will be used as a key in the hash table - this means that hashCode() will be called on it. So in fact, your key hashCode is likely worse than using the method. If you would like to make your code clearer by introducing a separate key class that's fine, but please just use

    @Override
    public int hashCode() {
        return (x+":"+y+":"+z).hashCode();
    }

So much simpler. Note that you may want to consider making sure that the delimiter chosen (in this case ":" does not occur in the strings x,y,z or else different strings could create the same hashCode.

Upvotes: 0

Mike Daniels
Mike Daniels

Reputation: 8642

I don't believe that your MyKey approach above would result in noticeably different behavior from a HashMap versus using a key generated with createMappingKey. Both MyKey and String have correct equals and hashCode behavior, so from a HashMap's perspective, there's no difference between the two.

I do think that using MyKey as your HashMap key instead of a plain old String makes it a little clearer to another programmer that there are strict rules for what constitutes a correct key, so that they don't do something goofy like stick in a "" key in the HashMap (either on purpose or by accident).

Upvotes: 0

Konstantin Komissarchik
Konstantin Komissarchik

Reputation: 29139

If your strings are relative short, then I would prefer the composite string key approach. The reason is that you aren't introducing code complexity. Do you really need to write/debug/maintain your composite key class and its equals/hashCode methods? On the other hand, if strings are not short, then the class-based approach is better as it will yield lower memory footprint.

Upvotes: 0

SLaks
SLaks

Reputation: 887867

Your code will be faster than a string.

You can simplify it by moving the logic into a common base class, using a Java equivalent of this technique. (using an abstract method which returns an array of property values)

It would look something like

abstract class EqualityComparable {
    protected abstract Object[] keys();

    @Override
    public boolean equals(Object o) { ... }
    @Override
    public int hashCode() { ... }
}

Upvotes: 0

Kylar
Kylar

Reputation: 9334

I hate to say it, but your answer is worse - it will use 4 times the number of object references (one for the key, plus one for each string = 4), plus it has to separately calculate 3 hashcodes plus additional processing for each lookup.

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533740

You could try a different separator like the following assuming there is no nul bytes in your string. A colon ':' is more likely to occur ??

protected String createMappingKey(String x, String y, String z) {
    return x+'\0'+y+'\0'+z;
}

Like you said, a lot of work for very little difference.

One downside is that it will treat null and "null" as the same. You have to judge whether this is likely to be a problem.

Upvotes: 1

Related Questions