Membership testing with hash collision for sets in python

Question

Here is my class which I intentionally made it so that hash collision occurs:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __repr__(self):
        return f'({self.name} - {self.age})'

    def __eq__(self, other):
        print('__eq__ called for', self)
        return False

    def __hash__(self):
        print('__hash__ called for', self)
        return 1


obj1 = Person('Soroush', 25)
obj2 = Person('John', 23)
print('
--------Creation of set---------
')
s = {obj1, obj2}
print(s)
print('
-------Membership testing-------
')
print(obj2 in s)

output:

--------Creation of set---------

__hash__ called for (Soroush - 25)
__hash__ called for (John - 23)
__eq__ called for (Soroush - 25)
{(Soroush - 25), (John - 23)}

-------Membership testing-------

__hash__ called for (John - 23)
__eq__ called for (Soroush - 25)
True

*My question is from "Membership testing" part of the output:

In obj2 in s statement, python first calculates the hash of obj2 (good, first line of output), then because obj1 and obj2 have the same hash, python is going to see if they are the same items or not so __eq__ of the obj1 is called(second line of the output). They are not ! I intentionally returned False for all instances. So python is continue checking the probe sequence. Here again the hashes are equal! why python doesn't call __eq__ of obj2 to see if those items are equal or not ?

I expected to see __eq__ called for (John - 23) in third line and also because obj2 is not equal to obj2 , last line of the output should be False. like:

print(obj2 == obj2)

output :

__eq__ called for (John - 23)
False

So, why there is no __eq__ called for (John - 23) and why True? They are not the same items(same thing happened a second ago with obj1) .

juanpa.arrivillaga · Accepted Answer

All built-in container types check object identity before testing (potentially expensive) object equality. This behavior is documented:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Of course, that isn't how hash-based containers actually check membership. Again, this is an optimization, this spares potentially expensive equality operations, consider checking the equality of two large strings.

Here's a built-in that works the same way, the famously problematic floating point NaN:

>>> nan = float('nan')
>>> nan == nan
False
>>> nan is nan
True
>>> nan is float('nan')
False
>>> nan == float('nan')
False

And this results in the following behavior with sets:

>>> nan = float('nan')
>>> myset = {nan}
>>> nan in myset
True
>>> float('nan') in myset
False

Note, it is false when we create a new float object.

Membership testing with hash collision for sets in python

Answers (2)

Related Questions