alexis
alexis

Reputation: 50200

How can I see what data is included in a pickle dump?

Some times a pickle dump is unexpectedly large. Assuming I can successfully pickle and unpickle an object, is there a way to inspect the dump and see exactly what is included?

Pickled objects include data but not code. If I didn't write the code, and the object is complex (e.g., an instance of a custom class with accessors, and lots of references to other data) it can be difficult to identify what is included in the dump and taking up so much space. Hence this question.

Upvotes: 3

Views: 1671

Answers (1)

davidism
davidism

Reputation: 127240

The built-in pickletools module can output information about each opcode represented in a pickle file. When used from the command line or with dis, it outputs the opcodes in a readable format. The example from the docs:

For example, with a tuple (1, 2) pickled in file x.pickle:

$ python -m pickle x.pickle
(1, 2)

$ python -m pickletools x.pickle
    0: \x80 PROTO      3
    2: K    BININT1    1
    4: K    BININT1    2
    6: \x86 TUPLE2
    7: q    BINPUT     0
    9: .    STOP
highest protocol among opcodes = 2

To get detailed information about an opcode, look in the code2op dict. Use genops to iterate over the pickle data along with this detailed information. For example, \x86 TUPLE2 from above means:

>>> print(pickletools.code2op['\x86'].doc)
Build a two-tuple out of the top two items on the stack.

      This code pops two values off the stack and pushes a tuple of
      length 2 whose items are those values back onto it.  In other
      words:

          stack[-2:] = [tuple(stack[-2:])]

Note that while a pickle is potentially unsafe to load (as it can execute arbitrary code), the pickle is not actually loaded when disassembling it, so it is safe to inspect the data.

Upvotes: 4

Related Questions