wehnsdaefflae
wehnsdaefflae

Reputation: 906

Element is present but `Set.contains(element)` returns false

How can an element not be contained in the original set but in its unmodified copy?

The original set does not contain the element while its copy does. See image.

The following method returns true, although it should always return false. The implementation of c and clusters is in both cases HashSet.

public static boolean confumbled(Set<String> c, Set<Set<String>> clusters) {
    return (!clusters.contains(c) && new HashSet<>(clusters).contains(c));
}

Debugging has shown that the element is contained in the original, but Set.contains(element) returns false for some reason. See image.

Could somebody please explain to me what's going on?

Upvotes: 20

Views: 11068

Answers (3)

OldCurmudgeon
OldCurmudgeon

Reputation: 65811

If the primary Set was a TreeSet (or perhaps some other NavigableSet) then it is possible, if your objects are imperfectly compared, for this to happen.

The critical point is that HashSet.contains looks like:

public boolean contains(Object o) {
    return map.containsKey(o);
}

and map is a HashMap and HashMap.containsKey looks like:

public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

so it uses the hashCode of the key to check for presence.

A TreeSet however uses a TreeMap internally and it's containsKey looks like:

final Entry<K,V> getEntry(Object key) {
    // Offload comparator-based version for sake of performance
    if (comparator != null)
        return getEntryUsingComparator(key);
    ...

So it uses a Comparator to find the key.

So, in summary, if your hashCode method does not agree with your Comparator.compareTo method (say compareTo returns 1 while hashCode returns different values) then you will see this kind of obscure behaviour.

class BadThing {

    final int hash;

    public BadThing(int hash) {
        this.hash = hash;
    }

    @Override
    public int hashCode() {
        return hash;
    }

    @Override
    public String toString() {
        return "BadThing{" + "hash=" + hash + '}';
    }

}

public void test() {
    Set<BadThing> primarySet = new TreeSet<>(new Comparator<BadThing>() {

        @Override
        public int compare(BadThing o1, BadThing o2) {
            return 1;
        }
    });
    // Make the things.
    BadThing bt1 = new BadThing(1);
    primarySet.add(bt1);
    BadThing bt2 = new BadThing(2);
    primarySet.add(bt2);
    // Make the secondary set.
    Set<BadThing> secondarySet = new HashSet<>(primarySet);
    // Have a poke around.
    test(primarySet, bt1);
    test(primarySet, bt2);
    test(secondarySet, bt1);
    test(secondarySet, bt2);
}

private void test(Set<BadThing> set, BadThing thing) {
    System.out.println(thing + " " + (set.contains(thing) ? "is" : "NOT") + " in <" + set.getClass().getSimpleName() + ">" + set);
}

prints

BadThing{hash=1} NOT in <TreeSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=2} NOT in <TreeSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=1} is in <HashSet>[BadThing{hash=1}, BadThing{hash=2}]
BadThing{hash=2} is in <HashSet>[BadThing{hash=1}, BadThing{hash=2}]

so even though the object is in the TreeSet it is not finding it because the comparator never returns 0. However, once it is in the HashSet all is fine because HashSet uses hashCode to find it and they behave in a valid way.

Upvotes: 8

Eran
Eran

Reputation: 393836

If you change an element in the Set (in your case the elements are Set<String>, so adding or removing a String will change them), Set.contains(element) may fail to locate it, since the hashCode of the element will be different than what it was when the element was first added to the HashSet.

When you create a new HashSet containing the elements of the original one, the elements are added based on their current hashCode, so Set.contains(element) will return true for the new HashSet.

You should avoid putting mutable instances in a HashSet (or using them as keys in a HashMap), and if you can't avoid it, make sure you remove the element before you mutate it and re-add it afterwards. Otherwise your HashSet will be broken.

An example :

Set<String> set = new HashSet<String>();
set.add("one");
set.add("two");
Set<Set<String>> setOfSets = new HashSet<Set<String>>();
setOfSets.add(set);
boolean found = setOfSets.contains(set); // returns true
set.add("three");
Set<Set<String>> newSetOfSets = new HashSet<Set<String>>(setOfSets);
found = setOfSets.contains(set); // returns false
found = newSetOfSets.contains(set); // returns true

Upvotes: 21

Peter Lawrey
Peter Lawrey

Reputation: 533530

The most common reason for this is that the element or key was altered after insertion resulting in a corruption of the underlying data structure.

note: when you add a reference to a Set<String> to another Set<Set<String>> you are adding a copy of the reference, the underlyingSet<String> is not copied and if you alter it these changes which affect the Set<Set<String>> you put it into.

e.g.

Set<String> s = new HashSet<>();
Set<Set<String>> ss = new HashSet<>();
ss.add(s);
assert ss.contains(s);

// altering the set after adding it corrupts the HashSet
s.add("Hi");
// there is a small chance it may still find it.
assert !ss.contains(s);

// build a correct structure by copying it.
Set<Set<String>> ss2 = new HashSet<>(ss);
assert ss2.contains(s);

s.add("There");
// not again.
assert !ss2.contains(s);

Upvotes: 8

Related Questions