tkokasih
tkokasih

Reputation: 1198

Set that only needs equals

I'm curious, is there any Set that only requires .equals() to determine the uniqueness?

When looking at Set classes from java.util, I can only find HashSet which needs .hashCode() and TreeSet (or generally SortedSet) which requires Comparator. I cannot find any class that use only .equals().

Does it make sense that if I have .equals() method, it is sufficient to use it to determine object uniqueness? Thus have a Set implementation that only need to use .equals()? Or did I miss something here that .equals() are not sufficient to determine object uniqueness in Set implementation?

Note that I am aware of Java practice that if we override .equals(), we should override .hashCode() as well to maintain contract defined in Object.

Upvotes: 4

Views: 2691

Answers (3)

Boann
Boann

Reputation: 50041

On its own, the equals method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.

The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.

For example, with only the equals method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2 comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.

Because it would not give scalable performance, no such set implementation is in the standard library.


If you don't care about hash table performance, this is a minimal implementation of the hashCode method which works for any class:

@Override
public int hashCode() {
    return 0; // or any other constant
}

Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet or use them as HashMap keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.


Also, for what it's worth, a minimal working Set implementation which only ever uses the equals method would be:

public class ArraySet<E> extends AbstractSet<E> {
    private final ArrayList<E> list = new ArrayList<>();

    @Override
    public boolean add(E e) {
        if (!list.contains(e)) {
            list.add(e);
            return true;
        }
        return false;
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public int size() {
        return list.size();
    }
}

The set stores objects in an ArrayList, and uses list.contains to call equals on objects. Inherited methods from AbstractSet and AbstractCollection provide the bulk of the functionality of the Set interface; for example its remove method gets implemented via the list iterator's remove method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.

Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet.

In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.

Upvotes: 5

Ekleog
Ekleog

Reputation: 1054

It would mathematically make sense to have a set that requires nothing but .equals().

But such an implementation would be so slow (linear time for every operation) that it has been decided that you can always give a hint.

Anyway, if there is really no way you can write a hashCode(), just make it always return 0 and you will have a structure that is as slow as the one you hoped for!

Upvotes: 2

You should always override hashCode() when you override equals(). The contract for Object clearly specifies that two equal objects have identical hash codes, and a surprising number of data structures and algorithms depend on this behavior. It's not difficult to add a hashCode(), and if you skip it now, you'll eventually get hard-to-diagnose bugs when your objects start getting put in hash-based structures.

Upvotes: 3

Related Questions