Reputation: 196
I'm trying to remove an element from HashSet, but it won't.
When calling .contains(obj)
, it returns true, so .contains
knows the object is in HashSet
. But when I call .remove(obj)
, the object isn't removed from the HashSet
and it returns false.
Code of object - https://gist.github.com/rincew1nd/f97f11d21ba41d2f4197f9a9da02fea1
Upvotes: 3
Views: 8557
Reputation: 325
This is because you haven't overriden Object#hashCode()
in your class.
The implementation of hashCode()
in Object returns a value that is computed from the calling objects memory address . So unless you pass a reference to the object contained in the map that you'd like to remove using remove(Object o)
then the HashSet won't be able to locate the Object.
NOTE: Reference is not the same as a equal object defined by o1.equals(o2)
See bottom of post for explanation as to why 'HashSet#contains(Object o)' still works even when Objects being stored in the map haven't overriden Object#hashCode()
Say I have a class MyClass
that doesn't override hashCode()
MyClass instance1 = new MyClass(); // Memory Address = 0x01
MyClass instance2 = new MyClass(); // Memory Address = 0x02
instance1.equals(instance2) == false
I can add these unique objects to a HashSet no problems, as adding to a HashSet only relies on equals(Object o)
, not hashCode()
.
Now let's say I want to remove them from the Set but I lost my original references to them, I do have the following references though:
instance3 // Memory Address = 0x03
instance4 // Memory Address = 0x04
instance1.equals(instance3) == true
instance2.equals(instance4) == true
A call to nonHashSet.remove(instance3)
will remove instance1
from the Set. But Hash Based Sets work completely different: the hashCode (value computed from the memory address since we used Object#hashCode()
) is used to locate the item. So when I ask the HashSet to remove instance3
it tells me that there is no element at that hash index. This is because the memory address of instance1 and instance3 are different, thus the hash for each is different meaning.
This is really bad as now the only reliable ways to remove instance1
or instance2
from the set is to clear the whole set or of course reinitialise it.
HashSet
:When you attempt to add an element first the HashSet checks if another element considered equal to the element being added is already contained. If the element being added is unique to all other elements, it is added to the set.
How a HashSet
differs from say a NonHashSet
is how the elements are stored and retrieved.
In a HashSet
, after an element is added to the set (as in when add(E element)
returns true) a hash of that element is produced by calling element.hashCode()
. In very loose terms you can think of the hash as the index that the element will be stored at.
Retrieved from https://en.wikipedia.org/wiki/Hash_table
Where key == element
, hash function == element.hashCode()
, buckets == indexes
in the way I described it above
A note regarding Buckets: unlike traditional storage where each element gets its own index, it's not uncommon for two different elements (not considered equal to each other) to produce the same hash. When this happens a collision handling mechanism is used to store elements that have the same hash. This is beyond the scope of this question, but a common and simple way to handle this is to make a List at the collision hash index that stores all elements that contain the same hash.
When removing an element from a hash based set, the element that we are passing as the argument to be removed has it's hash calculated, then that hash is used to locate the element. If there is more than one element at the hash index (occurs when there are elements that have the same hash) then the collision handling mechanism the HashSet uses is used to find the exact item we are looking for. In the above specified mechanism (store a List at the collision index to store each collided element) elements.equals(collidedElement)
The general contract a hashCode()
implementation should conform to:
Hash Code Contract Java SE 8
Further information: Why do I need to override the equals and hashCode methods in Java?
Simple way to implement a EFFECTIVE hash function: Best implementation for hashCode method
contains(Object o)
Still Works:Let's take a look at the code (Java Version 7u40-b43) that determines the return value ofcontains(Object o)
in HashSet class. I've added comments to describe what is happening:
// contains(Object o) will return true if this method returns non-null
// NOTE: Entry<K,V> seems out of place as we are dealing with a Set not a
// Map. But in fact the backing structure for HashSet is a type of Map
final Entry<K,V> getEntry(Object key) {
if (size == 0) { // Set is empty, search element can't be contained
return null;
}
// Calculate the hash of the search key:
// IF the search key == null --> hash = 0
// ELSE calculate and assign the hashCode for key inside method
// hash()
int hash = (key == null) ? 0 : hash(key);
// Start at hash calculated, after each iteration set e to be
// the element that comes after e (defined by e.next field). Stop
// iterating when e is null
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
// IF (the hash calculated from the search key and the hash
// calculated from this Entries key are equal AND
// this Entries key equals the search key) OR
// (search key != null AND the search key equals this Entries
// key)
// then return entry --> contains returns true
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
Upvotes: 8