Pickle with a specific module name

Question

I am using the pickle library to serialise a custom object, let's call it A, which is defined in a.py.

If I pickle an object of type A in a.py as follows:

import pickle

class A:
    ...

if __name__ == "__main__":
    inst = A("some param")
    with open("a.pickle", 'wb') as dumpfile:
        pickle.dump(inst, dumpfile)

Then there is a problem with loading this object from storage if the module A is not explicitly in the namespace __main__. See this other question. This is because pickle knows that it should look for the class A in __main__, since that's where it was when pickle.dump() happened.

Now, there are two approaches to dealing with this:

Deal with it at the deserialisation end,
Deal with it at the serialisation end.

For 1, there are various options (see the above link, for example), but I want to avoid these, because I think it makes sense to give the 'pickler' responsibility regarding its data.

For 2, we could just avoid pickling when the module is under the __main__ namespace, but that doesn't seem very flexible. We could alternatively modify A.__module__, and set it to the name of the module (as done here). Pickle uses this __module__ variable to find where to import the class A from, so setting it before .dump() works:

if __name__ == "__main__":
    inst = A("some param")
    A.__module__ = 'a'
    with open("a.pickle", 'wb') as dumpfile:
        pickle.dump(inst, dumpfile)

Q: is this a good idea? It seems like it's implementation dependent, not interface dependent. That is, pickle could decide to use another method of locating modules to import, and this approach would break. Is there an alternative that uses pickle's interface?

Pickle with a specific module name

Answers (1)

Related Questions