Reputation: 35
I am trying to save and load variables (dictionaries) to use in other notebooks. I save the variables with:
with open('opp2b.npy', 'wb') as f:
np.save(f, mak)
np.save(f, mp)
len(mak)
82
mak and mp are dictionaries with 82 entries of the same length. When loading if not using allow_pickle = True it will not load. So I use this
with open('opp2b.npy', 'rb') as f:
mak = np.load(f, allow_pickle=True)
mp = np.load(f, allow_pickle=True)
and when I check the length I get
len(mak)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-bb967ce1f5ef> in <module>
----> 1 len(mak)
TypeError: len() of unsized object
1
I am not sure why the array is modified, but it is now unusable for what I need it.
Upvotes: 0
Views: 238
Reputation: 155506
Per your comments, mak
is not a numpy
array at all. numpy.save
is specifically documented to:
Save an array to a binary file in NumPy
.npy
format.
allow_pickle
is for numpy
arrays containing Python objects, but the .npy
format is not intended to store things that aren't numpy
arrays at all. To successfully store the dict
, it's wrapping it in a 0D numpy
"array", and that's what np.load
is giving you. You could extract the original dict
by doing:
mak = mak.item(0) # mak = mak[0] doesn't work, and I'm unclear on why .item(0) works,
# as the docs claim the only difference is that .item(0) returns
# a Python scalar, rather than a numpy scalar, and that's not an
# issue here, but I assume something about 0D arrays requires this
But really, that's trying to put a square peg in a round hole. If you're not storing numpy
arrays, there's little benefit to the .npy
format, if any. The main advantages it provides are:
allow_pickle
, that advantage goes away)pickle
protocol 0
(that produced legal ASCII output, meaning only bytes of 127 or below, which made pickling raw binary data inefficient). As long as you're using protocol 2 or higher (which is binary, handles new-style classes efficiently, and is supported back to Python 2.3), it should store your data efficiently. As of Python 3.0, the default protocol is protocol 3 (rising to protocol 4 in 3.8), so if you're using a supported version of Python, and don't specify the protocol, it will use 3 or 4 (both of which work fine; protocol 4 being better if you're pickling huge objects).Since you aren't storing numpy
arrays, just rely on the pickle
module directly to store arbitrary data (for modern pickle protocols, which allow efficient binary storage, numpy
stores efficiently enough anyway, so the .npy
format isn't helping much, if at all; for some trivial test cases I tried, saving {'a': numpy.array([0,1,2])}
, the .npy
dump was over twice the size).
import pickle # At top of file
with open('opp2b.pkl', 'wb') as f: # Name with common pickle extension instead of .npy
pickle.dump(mak, f) # Argument order reversed from np.save
pickle.dump(mp, f)
and then to load:
with open('opp2b.pkl', 'rb') as f: # Matching change in name
mak = pickle.load(f)
mp = pickle.load(f)
This assumes you might in fact want to load only one data set or the other at a time; if you plan to store and load both all the time, you may as well condense it to a single write of a tuple
of the relevant values (increasing the chance that duplicated objects across the two objects can use back-references to avoid reserializing the same data multiple times), e.g.:
with open('opp2b.pkl', 'wb') as f:
pickle.dump((mak, mp), f)
and:
with open('opp2b.pkl', 'rb') as f:
mak, mp = pickle.load(f)
Upvotes: 3