Rincew1nd
Rincew1nd

Reputation: 196

HashSet.remove not working

I'm trying to remove an element from HashSet, but it won't.

When calling .contains(obj), it returns true, so .contains knows the object is in HashSet. But when I call .remove(obj), the object isn't removed from the HashSet and it returns false.

hashset object remove problem

Code of object - https://gist.github.com/rincew1nd/f97f11d21ba41d2f4197f9a9da02fea1

Upvotes: 3

Views: 8557

Answers (1)

FireLight
FireLight

Reputation: 325

This is because you haven't overriden Object#hashCode() in your class.

The implementation of hashCode() in Object returns a value that is computed from the calling objects memory address . So unless you pass a reference to the object contained in the map that you'd like to remove using remove(Object o) then the HashSet won't be able to locate the Object.
NOTE: Reference is not the same as a equal object defined by o1.equals(o2)

See bottom of post for explanation as to why 'HashSet#contains(Object o)' still works even when Objects being stored in the map haven't overriden Object#hashCode()


Say I have a class MyClass that doesn't override hashCode()
MyClass instance1 = new MyClass(); // Memory Address = 0x01
MyClass instance2 = new MyClass(); // Memory Address = 0x02
instance1.equals(instance2) == false

I can add these unique objects to a HashSet no problems, as adding to a HashSet only relies on equals(Object o), not hashCode().

Now let's say I want to remove them from the Set but I lost my original references to them, I do have the following references though:
instance3 // Memory Address = 0x03
instance4 // Memory Address = 0x04
instance1.equals(instance3) == true
instance2.equals(instance4) == true

A call to nonHashSet.remove(instance3) will remove instance1 from the Set. But Hash Based Sets work completely different: the hashCode (value computed from the memory address since we used Object#hashCode()) is used to locate the item. So when I ask the HashSet to remove instance3 it tells me that there is no element at that hash index. This is because the memory address of instance1 and instance3 are different, thus the hash for each is different meaning.

This is really bad as now the only reliable ways to remove instance1 or instance2 from the set is to clear the whole set or of course reinitialise it.


What happens under the hood in a HashSet:

When you attempt to add an element first the HashSet checks if another element considered equal to the element being added is already contained. If the element being added is unique to all other elements, it is added to the set.

How a HashSet differs from say a NonHashSet is how the elements are stored and retrieved.

In a HashSet, after an element is added to the set (as in when add(E element) returns true) a hash of that element is produced by calling element.hashCode(). In very loose terms you can think of the hash as the index that the element will be stored at.

Visual representation of storing elements in a hash based collection

Retrieved from https://en.wikipedia.org/wiki/Hash_table

Where key == element, hash function == element.hashCode(), buckets == indexes in the way I described it above

A note regarding Buckets: unlike traditional storage where each element gets its own index, it's not uncommon for two different elements (not considered equal to each other) to produce the same hash. When this happens a collision handling mechanism is used to store elements that have the same hash. This is beyond the scope of this question, but a common and simple way to handle this is to make a List at the collision hash index that stores all elements that contain the same hash.

When removing an element from a hash based set, the element that we are passing as the argument to be removed has it's hash calculated, then that hash is used to locate the element. If there is more than one element at the hash index (occurs when there are elements that have the same hash) then the collision handling mechanism the HashSet uses is used to find the exact item we are looking for. In the above specified mechanism (store a List at the collision index to store each collided element) elements.equals(collidedElement)

The general contract a hashCode() implementation should conform to: Hash Code Contract Java SE 8

Further information: Why do I need to override the equals and hashCode methods in Java?

Simple way to implement a EFFECTIVE hash function: Best implementation for hashCode method

How contains(Object o) Still Works:

Let's take a look at the code (Java Version 7u40-b43) that determines the return value ofcontains(Object o)in HashSet class. I've added comments to describe what is happening:

// contains(Object o) will return true if this method returns non-null
// NOTE: Entry<K,V> seems out of place as we are dealing with a Set not a 
//       Map. But in fact the backing structure for HashSet is a type of Map
final Entry<K,V> getEntry(Object key) {
        if (size == 0) { // Set is empty, search element can't be contained
            return null;
        }

        // Calculate the hash of the search key:
        // IF the search key == null --> hash = 0
        // ELSE calculate and assign the hashCode for key inside method 
        //      hash()
        int hash = (key == null) ? 0 : hash(key);

        // Start at hash calculated, after each iteration set e to be 
        // the element that comes after e  (defined by e.next field). Stop 
        // iterating when e is null
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            // IF (the hash calculated from the search key and the hash 
            //     calculated from this Entries key are equal AND
            //     this Entries key equals the search key) OR
            //    (search key != null AND the search key equals this Entries 
            //     key)
            // then return entry --> contains returns true
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

Upvotes: 8

Related Questions