Danqi Wang
Danqi Wang

Reputation: 1637

Passing a dict-like Object through multiprocessing.Queue Makes it Unable to be Modified by Attribute

Actually I am not sure whether the title describes the problem appropriately. Let me show the code.

import os
from multiprocessing import JoinableQueue

# A dict-like class, but is able to be accessed by attributes.
# example: d = AttrDict({'a': 1, 'b': 2})
# d.a is equivalent to d['a']
class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self


queue = JoinableQueue()
pid = os.fork()

if pid == 0:
    d = AttrDict({'a': 1, 'b': 2})
    queue.put(d)
    queue.join()
    os._exit(0)
else:
    d = queue.get()
    queue.task_done()
    #d = AttrDict(d.items())  #(1)
    d.a = 3                   #(2)
    #d['a'] = 3               #(3)
    print d

The above code prints {'a': 1, 'b': 2}, which means (2) is not taking any effect.

If I change (2) to (3), or enable (1), then the output is {'a': 3, 'b': 2}, which is expected.

Seems something happened to d when it is passed through queue.

Tested with Python 2.7.


Solution:

As pointed out by @kindall and @Blckknght, the reason is that d is picked as a dict and when it is unpickled by queue.get(), the self.__dict__ = self magic is not set. The difference can be seem by print d.__dict__ and print d.

To set the magic back, I added the method __setstate__ to AttrDict:

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

    def __setstate__(self, state):
        self.__dict__ = state

The code now works as expected.

Upvotes: 0

Views: 1922

Answers (2)

Blckknght
Blckknght

Reputation: 104722

This isn't really a multiprocessing issue, as mutlprocessing.Queue uses pickle to serialize and unserialize the objects you are sending through it. The problem lies with pickle not correctly preserving the "magic" behavior you get when you set self.__dict__ = self.

If you check the object you get in the child process, you'll find that its __dict__ is just an ordinary dictionary, with the same contents as the object itself. When you set a new attribute on the object, its __dict__ gets updated, but the inherited dictionary self does not. Here's what I mean:

>>> d = AttrDict({"a":1, "b":2})
>>> d2 = pickle.loads(pickle.dumps(d, -1))
>>> d2
{'a': 1, 'b': 2}
>>> d2.b = 3
>>> d2
{'a': 1, 'b': 2}
>>> d2.__dict__
{'a': 1, 'b': 3}

While you could dive into the nitty gritty details of how pickle works and get your serialization working again, I think a simpler approach would be to rely on less magical behavior by having your class override the __getattr__, __setattr__ and __delattr__ methods:

class AttrDict(dict):
    __slots__ = () # we don't need a __dict__

    def __getattr__(self, name): # wrapper around dict.__setitem__, with an exception fix
        try:
            return self[name]
        except KeyError:
            raise AttributeError(name) from None # raise the right type of exception

    def __delattr__(self, name): # wrapper around dict.__delitem__
        try:
            del self[name]
        except KeyError:
            raise AttributeError(name) from None # change exception type here too

    __setattr__ = dict.__setitem__ # no special exception rewriting needed here

Instances of this class will work just like your own, but they can be pickled and unpickled successfully:

>>> d = AttrDict({"a":1, "b":2})
>>> d2 = pickle.loads(pickle.dumps(d, -1)) # serialize and unserialize
>>> d2
{'a': 1, 'b': 2}
>>> d2.b=3
>>> d2
{'a': 1, 'b': 3}

Upvotes: 1

kindall
kindall

Reputation: 184200

My guess is that since it's a subclass of dict, your AttrDict is serialized as a dict. In particular the __dict__ pointing to self is probably not preserved. You can customize the serialization using certain magic methods; see this article.

Upvotes: 1

Related Questions