Reputation: 8029
Given this program:
class Obj:
def __init__(self, a, b):
self.a = a
self.b = b
def __hash__(self):
return hash((self.a, self.b))
class Collection:
def __init__(self):
self.objs = set()
def add(self, obj):
self.objs.add(obj)
def find(self, a, b):
objs = []
for obj in self.objs:
if obj.b == b and obj.a == a:
objs.append(obj)
return objs
def remove(self, a, b):
for obj in self.find(a, b):
print('removing', obj)
self.objs.remove(obj)
o1 = Obj('a1', 'b1')
o2 = Obj('a2', 'b2')
o3 = Obj('a3', 'b3')
o4 = Obj('a4', 'b4')
o5 = Obj('a5', 'b5')
objs = Collection()
for o in (o1, o2, o3, o4, o5):
objs.add(o)
objs.remove('a1', 'b1')
o2.a = 'a1'
o2.b = 'b1'
objs.remove('a1', 'b1')
o3.a = 'a1'
o3.b = 'b1'
objs.remove('a1', 'b1')
o4.a = 'a1'
o4.b = 'b1'
objs.remove('a1', 'b1')
o5.a = 'a1'
o5.b = 'b1'
If I run this a few times with Python 3.4.2, sometimes it will succeed, other times it throws a KeyError after removing 2 or 3 objects:
$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f3648035828>
removing <__main__.Obj object at 0x7f3648035860>
removing <__main__.Obj object at 0x7f3648035898>
removing <__main__.Obj object at 0x7f36480358d0>
$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f156170b828>
removing <__main__.Obj object at 0x7f156170b860>
Traceback (most recent call last):
File "py_set_obj_remove_test.py", line 42, in <module>
objs.remove('a1', 'b1')
File "py_set_obj_remove_test.py", line 27, in remove
self.objs.remove(obj)
KeyError: <__main__.Obj object at 0x7f156170b860>
Is this a bug in Python? Or something about the implementation of sets I don't know about?
Interestingly, it seems to always fail at the second objs.remove()
call in Python 2.7.9.
Upvotes: 1
Views: 473
Reputation: 1123410
This is not a bug in Python, your code is violating a principle of sets: that the hash value must not change. By mutating your object attributes, the hash changes and the set can no longer reliably locate the object in the set.
From the __hash__
method documentation:
If a class defines mutable objects and implements an
__eq__()
method, it should not implement__hash__()
, since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Custom Python classes define a default __eq__
method that returns True when both operands reference the same object (obj1 is obj2
is true).
That it sometimes works in Python 3 is a property of hash randomisation for strings. Because the hash value for a string changes between Python interpreter runs, and because the modulus of a hash against the size of the hash table is used, you can end up with the right hash slot anyway, purely by accident, and then the ==
equality test will still be true because you didn't implement a custom __eq__
method.
Python 2 has hash randomisation too but it is disabled by default, but you could make your test 'pass' anyway by carefully picking the 'right' values for the a
and b
attributes.
Instead, you could make your code work by basing your hash on the id()
of your instance; that makes the hash value not change and would match the default __eq__
implementation:
def __hash__(self):
return hash(id(self))
You could also just remove your __hash__
implementation for the same effect, as the default implementation does basically the above (with the id()
value rotated by 4 bits to evade memory alignment patterns). Again, from the __hash__
documentation:
User-defined classes have
__eq__()
and__hash__()
methods by default; with them, all objects compare unequal (except with themselves) andx.__hash__()
returns an appropriate value such thatx == y
implies both thatx is y
andhash(x) == hash(y)
.
Alternatively, implement an __eq__
method that bases equality on equality of the attributes of the instance, and don't mutate the attributes.
Upvotes: 6
Reputation: 81654
You are changing the objects (ie changing the objects' hash) after they were added to the set.
When remove
is called, it can't find that hash in the set because it was changed after it was calculated (when the objects were originally added to the set).
Upvotes: 3