Alex Z
Alex Z

Reputation: 1539

Unpickling from converted string in python/numpy

I have a ton of numpy ndarrays that are stored picked to strings. That may have been a poor design choice but it's what I did, and now the picked strings seem to have been converted or something along the way, when I try to unpickle I notice they are of type str and I get the following error:

TypeError: 'str' does not support the buffer interface

when I invoke

numpy.loads(bin_str)

Where bin_str is the thing I'm trying to unpickle. If I print out bin_strit looks like

b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\ ...

continuing for some time, so the info seems to be there, I'm just not quite sure how to convert it into whatever string format numpy/pickle need. On a whim I tried

numpy.loads( bytearray(bin_str, encoding='utf-8') )

and

numpy.loads( bin_str.encode() )

which both throw an error _pickle.UnpicklingError: unpickling stack underflow. Any ideas?

PS: I'm on python 3.3.2 and numpy 1.7.1

Edit

I discovered that if I do the following:

open('temp.txt', 'wb').write(...)
return numpy.load( 'temp.txt' )

I get back my array, and ... denotes copying and pasting the output of print(bin_str) from another window. I've tried writing bin_str to a file directly to unpickle but that doesn't work, it complains that TypeError: 'str' does not support the buffer interface. A few sane ways of converting bin_str to something that can be written directly to a binary file result in pickle errors when trying to read it back.

Edit 2 So I guess what's happened is that my binary pickle string ended up encoded inside of a normal string, something like:

"b'pickle'"

which is unfortunate and I haven't figured out how to deal with that, except this ridiculous and convoluted way to get it back:

open('temp.py', 'w').write('foo = ' + bin_str)
from temp import foo
numpy.loads( foo )

This seems like a very shameful solution to the problem, so please give me a better one!

Upvotes: 1

Views: 4051

Answers (1)

Blckknght
Blckknght

Reputation: 104722

It sounds like your saved strings are the reprs of the original bytes instances returned by your pickling code. That's a bit unfortunate, but not too bad. repr is intended to return a "machine friendly" representation of an object, and it can often be reversed by using eval:

import numpy as np
import pickle

# this part has already happened
orig_obj = np.array([1,2,3])
orig_pickle = pickle.dumps(orig_obj)
saved_str = repr(orig_pickle)     # this was a mistake, but it's already done

# this is what you need to do to get something equivalent to orig_obj back
reconstructed_pickle = eval(saved_str)
reconstructed_obj = pickle.loads(reconstructed_pickle)

# test
if np.all(reconstructed_obj == orig_obj):
    print("It worked!")

Obligatory note that using eval can be dangerous: Be aware that eval can run any Python code it wants, so don't call it with untrusted data. However, pickle data has the same risks (a malicious Pickle string can run arbitrary code upon unpickling), so you're not losing much safety in this situation. I'm guessing that you trust your data in this case anyway.

Upvotes: 2

Related Questions