Reputation: 101
I have written some code in Python which has a class called product and overrided the magic functions __eq__
and __hash__
. Now I need to make a set which should remove duplicates from the list based on the ID
of the product. As you can see the output of this code the hashes of two objects are the same, yet when i make a set of those two objects the length is 2 not one.
But, when i change the __eq__
method of the code to this
def __eq__(self, b) -> bool:
if self.id == b.id:
return True
return False
and use it with the same hash function it works and the length of the set is 1. So i am confused whether the set data-structure uses the __eq__
method to test for equality or the __hash__
method.
Upvotes: 2
Views: 987
Reputation: 736
Equality tests can be expensive, so the set starts by comparing hashes. If the hashes are not equal, then the check ends. If the hashes are equal, the set then tests for equality. If it only used __eq__
, it might have to do a lot of unnecessary work, but if it only used __hash__
, there would be no way to resolve a hash collision.
Here's a simple example of using equality to resolve a hash collision. All integers are their own hashes, except for -1
:
>>> hash(-1)
-2
>>> hash(-2)
-2
>>> s = set()
>>> s.add(-1)
>>> -2 in s
False
Here's an example of the set skipping an equality check because the hashes aren't equal. Let's subclass an int
so it return a new hash every second:
>>> class TimedInt(int):
... def __hash__(self):
... return int(time.time())
...
>>> a = TimedInt(5)
>>> a == 5
True
>>> a == a
True
>>> s = set()
>>> s.add(a) # Now wait a few seconds...
>>> a in s
False
Upvotes: 2