Reputation: 16536
Here is my class which I intentionally made it so that hash collision occurs:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return f'({self.name} - {self.age})'
def __eq__(self, other):
print('__eq__ called for', self)
return False
def __hash__(self):
print('__hash__ called for', self)
return 1
obj1 = Person('Soroush', 25)
obj2 = Person('John', 23)
print('\n--------Creation of set---------\n')
s = {obj1, obj2}
print(s)
print('\n-------Membership testing-------\n')
print(obj2 in s)
output:
--------Creation of set---------
__hash__ called for (Soroush - 25)
__hash__ called for (John - 23)
__eq__ called for (Soroush - 25)
{(Soroush - 25), (John - 23)}
-------Membership testing-------
__hash__ called for (John - 23)
__eq__ called for (Soroush - 25)
True
*My question is from "Membership testing" part of the output:
In obj2 in s
statement, python first calculates the hash of obj2
(good, first line of output), then because obj1
and obj2
have the same hash, python is going to see if they are the same items or not so __eq__
of the obj1
is called(second line of the output). They are not ! I intentionally returned False
for all instances. So python is continue checking the probe sequence. Here again the hashes are equal! why python doesn't call __eq__
of obj2
to see if those items are equal or not ?
I expected to see __eq__ called for (John - 23)
in third line and also because obj2
is not equal to obj2
, last line of the output should be False
. like:
print(obj2 == obj2)
output :
__eq__ called for (John - 23)
False
So, why there is no __eq__ called for (John - 23)
and why True
? They are not the same items(same thing happened a second ago with obj1
) .
Upvotes: 3
Views: 197
Reputation: 96246
All built-in container types check object identity before testing (potentially expensive) object equality. This behavior is documented:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to
any(x is e or x == e for e in y)
.
Of course, that isn't how hash-based containers actually check membership. Again, this is an optimization, this spares potentially expensive equality operations, consider checking the equality of two large strings.
Here's a built-in that works the same way, the famously problematic floating point NaN:
>>> nan = float('nan')
>>> nan == nan
False
>>> nan is nan
True
>>> nan is float('nan')
False
>>> nan == float('nan')
False
And this results in the following behavior with sets:
>>> nan = float('nan')
>>> myset = {nan}
>>> nan in myset
True
>>> float('nan') in myset
False
Note, it is false when we create a new float
object.
Upvotes: 4
Reputation: 9047
So, why there is no __eq__ called for (John - 23)
This can be easily achieved by using for in
for i in s:
print(i)
if i == obj2:
print('equal', i)
-------Membership testing-------
(John - 23)
__eq__ called for (John - 23) (John - 23)
(Soroush - 25)
__eq__ called for (Soroush - 25) (John - 23)
But obviously, this is not what you want, let's add another person and try the in
operator
obj1 = Person('Soroush', 25)
obj2 = Person('John', 23)
obj3 = Person('Kite', 23)
print('\n--------Creation of set---------\n')
s = {obj3, obj2, obj1}
print(s)
print('\n-------Membership testing-------\n')
print(obj2 in s)
-------Membership testing-------
__hash__ called for (John - 23)
__eq__ called for (Kite - 23) (John - 23)
True
Now by changing the set element position
.
.
s = {obj3, obj1, obj2}
.
.
__hash__ called for (John - 23)
__eq__ called for (Kite - 23) (John - 23)
__eq__ called for (Soroush - 25) (John - 23)
True
From this, I came to the conclusion that the True
is coming from the check of some parameter
for the 3rd element (last example). And that is most obviously the id
or memory location in case CPython. Because for this case
.
.
print(Person('John', 23) in s)
.
.
__hash__ called for (John - 23)
__eq__ called for (Kite - 23) (John - 23)
__eq__ called for (Soroush - 25) (John - 23)
__eq__ called for (John - 23) (John - 23)
False
You get false, because id(Person('John', 23))
is obviously not same as obj2
. Similar behavior can be reproduced in for in
instead of ==
use is
obj1 = Person('Soroush', 25)
obj2 = Person('John', 23)
print('\n--------Creation of set---------\n')
s = {obj2, obj1}
print(s)
print('\n-------Membership testing-------\n')
for i in s:
print(i)
if i is obj2:
print('equal', i)
print(obj2 in s)
-------Membership testing-------
(John - 23)
equal (John - 23) ------------------------
(Soroush - 25) |
|
|--you are missing this output in `in` check
__hash__ called for (John - 23) |
True -----------------------------------------|
Upvotes: 1