Reputation: 906
How can an element not be contained in the original set but in its unmodified copy?
The original set does not contain the element while its copy does. See image.
The following method returns true
, although it should always return false
. The implementation of c
and clusters
is in both cases HashSet
.
public static boolean confumbled(Set<String> c, Set<Set<String>> clusters) {
return (!clusters.contains(c) && new HashSet<>(clusters).contains(c));
}
Debugging has shown that the element is contained in the original, but Set.contains(element)
returns false
for some reason. See image.
Could somebody please explain to me what's going on?
Upvotes: 20
Views: 11068
Reputation: 65811
If the primary Set
was a TreeSet
(or perhaps some other NavigableSet
) then it is possible, if your objects are imperfectly compared, for this to happen.
The critical point is that HashSet.contains
looks like:
public boolean contains(Object o) {
return map.containsKey(o);
}
and map
is a HashMap
and HashMap.containsKey
looks like:
public boolean containsKey(Object key) {
return getNode(hash(key), key) != null;
}
so it uses the hashCode
of the key to check for presence.
A TreeSet
however uses a TreeMap
internally and it's containsKey
looks like:
final Entry<K,V> getEntry(Object key) {
// Offload comparator-based version for sake of performance
if (comparator != null)
return getEntryUsingComparator(key);
...
So it uses a Comparator
to find the key.
So, in summary, if your hashCode
method does not agree with your Comparator.compareTo
method (say compareTo
returns 1
while hashCode
returns different values) then you will see this kind of obscure behaviour.
class BadThing {
final int hash;
public BadThing(int hash) {
this.hash = hash;
}
@Override
public int hashCode() {
return hash;
}
@Override
public String toString() {
return "BadThing{" + "hash=" + hash + '}';
}
}
public void test() {
Set<BadThing> primarySet = new TreeSet<>(new Comparator<BadThing>() {
@Override
public int compare(BadThing o1, BadThing o2) {
return 1;
}
});
// Make the things.
BadThing bt1 = new BadThing(1);
primarySet.add(bt1);
BadThing bt2 = new BadThing(2);
primarySet.add(bt2);
// Make the secondary set.
Set<BadThing> secondarySet = new HashSet<>(primarySet);
// Have a poke around.
test(primarySet, bt1);
test(primarySet, bt2);
test(secondarySet, bt1);
test(secondarySet, bt2);
}
private void test(Set<BadThing> set, BadThing thing) {
System.out.println(thing + " " + (set.contains(thing) ? "is" : "NOT") + " in <" + set.getClass().getSimpleName() + ">" + set);
}
prints
BadThing{hash=1} NOT in <TreeSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=2} NOT in <TreeSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=1} is in <HashSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=2} is in <HashSet>[BadThing{hash=1}, BadThing{hash=2}]
so even though the object is in the TreeSet
it is not finding it because the comparator never returns 0
. However, once it is in the HashSet
all is fine because HashSet
uses hashCode
to find it and they behave in a valid way.
Upvotes: 8
Reputation: 393836
If you change an element in the Set
(in your case the elements are Set<String>
, so adding or removing a String will change them), Set.contains(element)
may fail to locate it, since the hashCode
of the element will be different than what it was when the element was first added to the HashSet
.
When you create a new HashSet
containing the elements of the original one, the elements are added based on their current hashCode
, so Set.contains(element)
will return true for the new HashSet
.
You should avoid putting mutable instances in a HashSet
(or using them as keys in a HashMap
), and if you can't avoid it, make sure you remove the element before you mutate it and re-add it afterwards. Otherwise your HashSet
will be broken.
An example :
Set<String> set = new HashSet<String>();
set.add("one");
set.add("two");
Set<Set<String>> setOfSets = new HashSet<Set<String>>();
setOfSets.add(set);
boolean found = setOfSets.contains(set); // returns true
set.add("three");
Set<Set<String>> newSetOfSets = new HashSet<Set<String>>(setOfSets);
found = setOfSets.contains(set); // returns false
found = newSetOfSets.contains(set); // returns true
Upvotes: 21
Reputation: 533530
The most common reason for this is that the element or key was altered after insertion resulting in a corruption of the underlying data structure.
note: when you add a reference to a Set<String>
to another Set<Set<String>>
you are adding a copy of the reference, the underlyingSet<String>
is not copied and if you alter it these changes which affect the Set<Set<String>>
you put it into.
e.g.
Set<String> s = new HashSet<>();
Set<Set<String>> ss = new HashSet<>();
ss.add(s);
assert ss.contains(s);
// altering the set after adding it corrupts the HashSet
s.add("Hi");
// there is a small chance it may still find it.
assert !ss.contains(s);
// build a correct structure by copying it.
Set<Set<String>> ss2 = new HashSet<>(ss);
assert ss2.contains(s);
s.add("There");
// not again.
assert !ss2.contains(s);
Upvotes: 8