dottomato
dottomato

Reputation: 101

What does set() in Python use to test equality between objects?

I have written some code in Python which has a class called product and overrided the magic functions __eq__ and __hash__. Now I need to make a set which should remove duplicates from the list based on the ID of the product. As you can see the output of this code the hashes of two objects are the same, yet when i make a set of those two objects the length is 2 not one.

But, when i change the __eq__ method of the code to this

def __eq__(self, b) -> bool:
    if self.id == b.id:
        return True
    return False

and use it with the same hash function it works and the length of the set is 1. So i am confused whether the set data-structure uses the __eq__ method to test for equality or the __hash__ method.

Upvotes: 2

Views: 987

Answers (1)

water_ghosts
water_ghosts

Reputation: 736

Equality tests can be expensive, so the set starts by comparing hashes. If the hashes are not equal, then the check ends. If the hashes are equal, the set then tests for equality. If it only used __eq__, it might have to do a lot of unnecessary work, but if it only used __hash__, there would be no way to resolve a hash collision.

Here's a simple example of using equality to resolve a hash collision. All integers are their own hashes, except for -1:

>>> hash(-1)
-2
>>> hash(-2)
-2
>>> s = set()
>>> s.add(-1)
>>> -2 in s
False

Here's an example of the set skipping an equality check because the hashes aren't equal. Let's subclass an int so it return a new hash every second:

>>> class TimedInt(int):
...  def __hash__(self):
...   return int(time.time())
...
>>> a = TimedInt(5)
>>> a == 5
True
>>> a == a
True
>>> s = set()
>>> s.add(a)  # Now wait a few seconds...
>>> a in s
False

Upvotes: 2

Related Questions