Useful_Investigator
Useful_Investigator

Reputation: 948

Pickle with a specific module name

I am using the pickle library to serialise a custom object, let's call it A, which is defined in a.py.

If I pickle an object of type A in a.py as follows:

import pickle

class A:
    ...

if __name__ == "__main__":
    inst = A("some param")
    with open("a.pickle", 'wb') as dumpfile:
        pickle.dump(inst, dumpfile)

Then there is a problem with loading this object from storage if the module A is not explicitly in the namespace __main__. See this other question. This is because pickle knows that it should look for the class A in __main__, since that's where it was when pickle.dump() happened.

Now, there are two approaches to dealing with this:

  1. Deal with it at the deserialisation end,
  2. Deal with it at the serialisation end.

For 1, there are various options (see the above link, for example), but I want to avoid these, because I think it makes sense to give the 'pickler' responsibility regarding its data.

For 2, we could just avoid pickling when the module is under the __main__ namespace, but that doesn't seem very flexible. We could alternatively modify A.__module__, and set it to the name of the module (as done here). Pickle uses this __module__ variable to find where to import the class A from, so setting it before .dump() works:

if __name__ == "__main__":
    inst = A("some param")
    A.__module__ = 'a'
    with open("a.pickle", 'wb') as dumpfile:
        pickle.dump(inst, dumpfile)

Q: is this a good idea? It seems like it's implementation dependent, not interface dependent. That is, pickle could decide to use another method of locating modules to import, and this approach would break. Is there an alternative that uses pickle's interface?

Upvotes: 1

Views: 898

Answers (1)

Jacques Gaudin
Jacques Gaudin

Reputation: 16968

Another way around it would be to import the file itself:

import pickle
import a

class A:
    pass

def main():
    inst = a.A()
    print(inst.__module__)
    with open("a.pickle", 'wb') as dumpfile:
        pickle.dump(inst, dumpfile)

if __name__ == "__main__":
    main()

Note that it works because the import statement is purely an assignment of the name a to a module object, it doesn't go to infinite recursion when you import a within a.py.

Upvotes: 1

Related Questions