Reputation: 1198
I'm curious, is there any Set
that only requires .equals()
to determine the uniqueness?
When looking at Set
classes from java.util
, I can only find HashSet
which needs .hashCode()
and TreeSet
(or generally SortedSet
) which requires Comparator
. I cannot find any class that use only .equals()
.
Does it make sense that if I have .equals()
method, it is sufficient to use it to determine object uniqueness? Thus have a Set
implementation that only need to use .equals()
? Or did I miss something here that .equals()
are not sufficient to determine object uniqueness in Set
implementation?
Note that I am aware of Java practice that if we override .equals()
, we should override .hashCode()
as well to maintain contract defined in Object
.
Upvotes: 4
Views: 2691
Reputation: 50041
On its own, the equals
method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.
The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals
method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.
For example, with only the equals
method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2
comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.
Because it would not give scalable performance, no such set implementation is in the standard library.
If you don't care about hash table performance, this is a minimal implementation of the hashCode
method which works for any class:
@Override
public int hashCode() {
return 0; // or any other constant
}
Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet
or use them as HashMap
keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.
Also, for what it's worth, a minimal working Set
implementation which only ever uses the equals
method would be:
public class ArraySet<E> extends AbstractSet<E> {
private final ArrayList<E> list = new ArrayList<>();
@Override
public boolean add(E e) {
if (!list.contains(e)) {
list.add(e);
return true;
}
return false;
}
@Override
public Iterator<E> iterator() {
return list.iterator();
}
@Override
public int size() {
return list.size();
}
}
The set stores objects in an ArrayList
, and uses list.contains
to call equals
on objects. Inherited methods from AbstractSet
and AbstractCollection
provide the bulk of the functionality of the Set
interface; for example its remove
method gets implemented via the list iterator's remove
method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.
Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet
.
In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.
Upvotes: 5
Reputation: 1054
It would mathematically make sense to have a set that requires nothing but .equals()
.
But such an implementation would be so slow (linear time for every operation) that it has been decided that you can always give a hint.
Anyway, if there is really no way you can write a hashCode()
, just make it always return 0 and you will have a structure that is as slow as the one you hoped for!
Upvotes: 2
Reputation: 77206
You should always override hashCode()
when you override equals()
. The contract for Object
clearly specifies that two equal objects have identical hash codes, and a surprising number of data structures and algorithms depend on this behavior. It's not difficult to add a hashCode()
, and if you skip it now, you'll eventually get hard-to-diagnose bugs when your objects start getting put in hash-based structures.
Upvotes: 3