Andreas
Andreas

Reputation: 8029

Removing objects that have changed from a Python set

Given this program:

class Obj:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __hash__(self):
        return hash((self.a, self.b))

class Collection:
    def __init__(self):
        self.objs = set()

    def add(self, obj):
        self.objs.add(obj)

    def find(self, a, b):
        objs = []

        for obj in self.objs:
            if obj.b == b and obj.a == a:
                objs.append(obj)
        return objs

    def remove(self, a, b):
        for obj in self.find(a, b):
            print('removing', obj)
            self.objs.remove(obj)

o1 = Obj('a1', 'b1')
o2 = Obj('a2', 'b2')
o3 = Obj('a3', 'b3')
o4 = Obj('a4', 'b4')
o5 = Obj('a5', 'b5')

objs = Collection()
for o in (o1, o2, o3, o4, o5):
    objs.add(o)

objs.remove('a1', 'b1')
o2.a = 'a1'
o2.b = 'b1'
objs.remove('a1', 'b1')
o3.a = 'a1'
o3.b = 'b1'
objs.remove('a1', 'b1')
o4.a = 'a1'
o4.b = 'b1'
objs.remove('a1', 'b1')
o5.a = 'a1'
o5.b = 'b1'

If I run this a few times with Python 3.4.2, sometimes it will succeed, other times it throws a KeyError after removing 2 or 3 objects:

$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f3648035828>
removing <__main__.Obj object at 0x7f3648035860>
removing <__main__.Obj object at 0x7f3648035898>
removing <__main__.Obj object at 0x7f36480358d0>

$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f156170b828>
removing <__main__.Obj object at 0x7f156170b860>
Traceback (most recent call last):
  File "py_set_obj_remove_test.py", line 42, in <module>
    objs.remove('a1', 'b1')
  File "py_set_obj_remove_test.py", line 27, in remove
    self.objs.remove(obj)
KeyError: <__main__.Obj object at 0x7f156170b860>

Is this a bug in Python? Or something about the implementation of sets I don't know about?

Interestingly, it seems to always fail at the second objs.remove() call in Python 2.7.9.

Upvotes: 1

Views: 473

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1123410

This is not a bug in Python, your code is violating a principle of sets: that the hash value must not change. By mutating your object attributes, the hash changes and the set can no longer reliably locate the object in the set.

From the __hash__ method documentation:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Custom Python classes define a default __eq__ method that returns True when both operands reference the same object (obj1 is obj2 is true).

That it sometimes works in Python 3 is a property of hash randomisation for strings. Because the hash value for a string changes between Python interpreter runs, and because the modulus of a hash against the size of the hash table is used, you can end up with the right hash slot anyway, purely by accident, and then the == equality test will still be true because you didn't implement a custom __eq__ method.

Python 2 has hash randomisation too but it is disabled by default, but you could make your test 'pass' anyway by carefully picking the 'right' values for the a and b attributes.

Instead, you could make your code work by basing your hash on the id() of your instance; that makes the hash value not change and would match the default __eq__ implementation:

def __hash__(self):
    return hash(id(self))

You could also just remove your __hash__ implementation for the same effect, as the default implementation does basically the above (with the id() value rotated by 4 bits to evade memory alignment patterns). Again, from the __hash__ documentation:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

Alternatively, implement an __eq__ method that bases equality on equality of the attributes of the instance, and don't mutate the attributes.

Upvotes: 6

DeepSpace
DeepSpace

Reputation: 81654

You are changing the objects (ie changing the objects' hash) after they were added to the set.

When remove is called, it can't find that hash in the set because it was changed after it was calculated (when the objects were originally added to the set).

Upvotes: 3

Related Questions