mathieu
mathieu

Reputation: 3184

serializing and deserializing lambdas

I would like to serialize on machine A and deserialize on machine B a python lambda. There are a couple of obvious problems with that:

Hence, my question(s):

Upvotes: 17

Views: 9493

Answers (3)

Erik Aronesty
Erik Aronesty

Reputation: 12935

I wrote a library called msgpickle (pip install). The reason i wrote it is because i wanted a pickler where i could easily control what can and what cannot be pickled.

So, while pickling lambdas is unsafe, you can enable it as needed and create new picklers for any class and build a serialization strategy that is sound.

The basis of it is msgpack, and it uses that for it's default serializer.

In order to serialize lambdas, you can do this:

serializer = msgpickle.MsgPickle()
serializer.register(*msgpickle.cloud_function_serializer)

now your serializer supports:

dat = serializer.dumps(lambda: 99)
fun = serializer.loads(dat)
assert fun() == 99

The core of it is the function-packer, which just uses python's code object:

def cloud_func_pack(obj: Any) -> Any:
    code_obj = obj.__code__
    # this has a chance of working for future versions of Python
    xmap = {"codestring": "code", "constants": "consts"}
    code_arg_names = [
        "co_" + xmap.get(param.name, param.name) for param in code_type_params.values()
    ]

    def convert(value: Any) -> Any:
        if isinstance(value, tuple):
            return list(value)
        return value

    code_attributes = [convert(getattr(code_obj, attr)) for attr in code_arg_names]
    return code_attributes

That means it's unsafe across python versions, but that's the same as any other pickler.

The difference is the simplicity and explicitness.

Upvotes: 0

Mike McKerns
Mike McKerns

Reputation: 35247

I'm not sure exactly what you want to do, but you could try dill. Dill can serialize and deserialize lambdas and I believe also works for lambdas inside closures. The pickle API is a subset of it's API. To use it, just "import dill as pickle" and go about your business pickling stuff.

>>> import dill
>>> testme = lambda x: lambda y:x
>>> _testme = dill.loads(dill.dumps(testme))
>>> testme
<function <lambda> at 0x1d92530>
>>> _testme
<function <lambda> at 0x1d924f0>
>>> 
>>> def complicated(a,b):
...   def nested(x):
...     return testme(x)(a) * b
...   return nested
... 
>>> _complicated = dill.loads(dill.dumps(complicated))
>>> complicated 
<function complicated at 0x1d925b0>
>>> _complicated
<function complicated at 0x1d92570>

Dill registers it's types into the pickle registry, so if you have some black box code that uses pickle and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code. Or, if you want the whole interpreter session sent over the wire as an "python image", dill can do that too.

>>> # continuing from above
>>> dill.dump_session('foobar.pkl')
>>>
>>> ^D
dude@sakurai>$ python
Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('foobar.pkl')
>>> testme(4)
<function <lambda> at 0x1d924b0>
>>> testme(4)(5)
4
>>> dill.source.getsource(testme)
'testme = lambda x: lambda y:x\n'

You can easily send the image across ssh to another computer, and start where you left off there as long as there's version compatibility of pickle and the usual caveats about python changing and things being installed. As shown, you can also extract the source of the lambda that was defined in the previous session.

Dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

Upvotes: 4

David Wolever
David Wolever

Reputation: 154624

Surprisingly, checking whether a lambda will work without its associated closure is actually fairly easy. According to the data model documentation, you can just check the func_closure attribute:

>>> def get_lambdas():
...     bar = 42
...     return (lambda: 1, lambda: bar)
...
>>> no_vars, vars = get_lambdas()
>>> print no_vars.func_closure
None
>>> print vars.func_closure
(<cell at 0x1020d3d70: int object at 0x7fc150413708>,)
>>> print vars.func_closure[0].cell_contents
42
>>>

Then serializing + loading the lambda is fairly straight forward:

>>> import marshal, types
>>> old = lambda: 42
>>> old_code_serialized = marshal.dumps(old.func_code)
>>> new_code = marshal.loads(old_code_serialized)
>>> new = types.FunctionType(new_code, globals())
>>> new()
42

It's worth taking a look at the documentation for the FunctionType:

function(code, globals[, name[, argdefs[, closure]]])

Create a function object from a code object and a dictionary.
The optional name string overrides the name from the code object.
The optional argdefs tuple specifies the default argument values.
The optional closure tuple supplies the bindings for free variables.

Notice that you can also supply a closure… Which means you might even be able to serialize the old function's closure then load it at the other end :)

Upvotes: 22

Related Questions