Reputation: 177
import pickle
data_pkl = open("data.pkl", "rb")
d_c = data_pkl.read()
data_pkl.close()
print(d_c)
I am new to handling data structure. When I tried to read pickle data and result looks like below:
b'\x80\x03}q\x00(X\x05\x00\x00\x00Phoneq\x01}q\x02(cnumpy.core.multiarray\nscalar\nq\x03cnumpy\ndtype\nq\x04X\x02\x00\x00\x00i8q\x05K\x00K\x01\x87q\x06Rq\x07(K\x03X\x01\x00\x00\x00
......... long line
How can I convert this to human-readable format in python?
Upvotes: 1
Views: 12657
Reputation: 693
When google brought me to this question, the answer that I would have liked to have seen was to import pickletools
and then use pickletools.dis(s)
to explain what the various characters between the understandable substrings within pickle s
were indicating. This is only marginally human-readable, since it reads more like machine assembly language than python, but it still helps a human reader to peer behind the curtain and make some sense of the gobbledygook.
Of course, what we usually want isn't for humans to read serialized data, but for computers to read it and make good use of it. When that's what you want, pickle.load
or pickle.loads
are the way to go. Or if, for some reason, you want to serialize your data in a format that is both human-readable and machine-readable, you probably want some other serializer, like JSON, or you could set pickle to encode with the original pickle protocol 0, which was human-readable (but less efficient).
Upvotes: 2
Reputation: 22952
When a data is dumped, pickle produce a bytes string. This is what you have.
For instance:
import pickle
data = {'text': 'value', 'list': [1, 2, 3]}
s = pickle.dumps(data)
print(s)
Produces the bytes string:
b'\x80\x03}q\x00(X\x04\x00\x00\x00textq\x01X\x05\x00\x00'
b'\x00valueq\x02X\x04\x00\x00\x00listq\x03]q\x04(K\x01K'
b'\x02K\x03eu.'
note: I split the long line in 3 parts for readability.
Python defines several protocols, names HIGHEST_PROTOCOL
and DEFAULT_PROTOCOL
. So, If you change the protocol you can have a different result.
To read this bytes string, you need to use pickle.load
(or pickle.loads
to read from a bytes string).
For instance:
import pprint
obj = pickle.loads(s)
pprint.pprint(obj)
You get:
{'list': [1, 2, 3], 'text': 'value'}
Cool, but if your data contains instance of unknown type, you won’t be able to deserialize it.
Here is an example:
import pickle
import pprint
class UnknownClass:
def __init__(self, value):
self.value = value
data = {'text': 'value',
'list': [1, 2, 3],
'u': UnknownClass(25)}
s = pickle.dumps(data)
print(s)
del UnknownClass
obj = pickle.loads(s)
The del
statement is here to simulate an unknown type.
The result will be:
Traceback (most recent call last):
File "/path/to/stack.py", line 19, in <module>
obj = pickle.loads(s)
AttributeError: Can't get attribute 'UnknownClass' on <module '__main__' from '/path/to/stack.py'>
For more info, the protocols are specified in the Python documentation.
Upvotes: 2
Reputation: 1230
I would recommend looking at the Python documentation, in particular the pickle
module docs. Your current code is importing pickle
, but it's not actually using pickle
, since you're just loading the file using read()
. Using pickle.load()
or another pickle
method should do the trick.
For example:
d_c = pickle.load(data_pkl)
Editing to add the mandatory pickle warning from the docs:
Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
(Unpickling an unknown file leaves you open to having arbitrary code executed on your computer, so be careful what you unpickle!)
Upvotes: 0