Lior Graf
Lior Graf

Reputation: 163

why python's pickle is not serializing a method as default argument?

I am trying to use pickle to transfer python objects over the wire between 2 servers. I created a simple class, that subclasses dict and I am trying to use pickle for the marshalling:

def value_is_not_none(value):
    return value is not None

class CustomDict(dict):
    def __init__(self, cond=lambda x: x is not None):
        super().__init__()
        self.cond = cond

    def __setitem__(self, key, value):
        if self.cond(value):
            dict.__setitem__(self, key, value)

I first tried to use pickle for the marshalling, but when I un-marshalled I received an error related to the lambda expression.

Then I tried to do the marshalling with dill but it seemed the __init__ was not called.

Then I tried again with pickle, but I passed the value_is_not_none() function as the cond parameter - again the __init__() does not seemed to be invoked and the un-marshalling failed on the __setitem__() (cond is None).

Why is that? what am I missing here?

If I try to run the following code:

obj = CustomDict(cond=value_is_not_none)
obj['hello'] = ['world']

payload = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
obj2 = pickle.loads(payload)

it fails with

AttributeError: 'CustomDict' object has no attribute 'cond'

This is a different question than: Python, cPickle, pickling lambda functions as I tried using dill with lambda and it failed to work, and I also tried passing a function and it also failed.

Upvotes: 3

Views: 888

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124170

pickle is loading your dictionary data before it has restored the attributes on your instance. As such the self.cond attribute is not yet set when __setitem__ is called for the dictionary key-value pairs.

Note that pickle will never call __init__; instead it'll create an entirely blank instance and restore the __dict__ attribute namespace on that directly.

You have two options:

  • default to cond=None and ignore the condition if it is still set to None:

    class CustomDict(dict):
        def __init__(self, cond=None):
            super().__init__()
            self.cond = cond
    
        def __setitem__(self, key, value):
            if getattr(self, 'cond', None) is None or self.cond(value):
                dict.__setitem__(self, key, value)
    

    The getattr() there is needed because a blank instance has no cond attribute at all (it is not set to None, the attribute is entirely missing). You could add cond = None to the class:

    class CustomDict(dict):
        cond = None
    

    and then just test for if self.cond is None or self.cond(value):.

  • Define a custom __reduce__ method to control how the initial object is created when restored:

    def _default_cond(v): return v is not None
    
    class CustomDict(dict):
        def __init__(self, cond=_default_cond):
            super().__init__()
            self.cond = cond
    
        def __setitem__(self, key, value):
            if self.cond(value):
                dict.__setitem__(self, key, value)
    
        def __reduce__(self):
            return (CustomDict, (self.cond,), None, None, iter(self.items()))
    

    __reduce__ is expected to return a tuple with:

    • A callable that can be pickled directly (here the class does fine)
    • A tuple of positional arguments for that callable; on unpickling the first element is called passing in the second as arguments, so by setting this to (self.cond,) we ensure that the new instance is created with cond passed in as an argument and now CustomDict.__init__() will be called.
    • The next 2 positions are for a __setstate__ method (ignored here) and for list-like types, so we set these to None.
    • The last element is an iterator for the key-value pairs that pickle then will restore for us.

    Note that I replaced the default value for cond with a function here too so you don't have to rely on dill for the pickling.

Upvotes: 2

Related Questions