Reputation: 685
I've inherited a bit of python code that saves its output to a pickle with the following code:
def save_picklefile_and_close(self):
pickle.dump({'p_index': self.p_index, 'p_val': self.p_val,
'z': self.z, 'w': self.w, 'nd': self.nd, 'nfg': self.nfg,
'lamb_scalar': self.lamb_scalar, 'PEAKSIZE': self.PEAKSIZE,
'N': self.N, 'S': self.S, 'F': self.F,
'FG': self.FG, 'q': self.q,
'pi': self.pi, 'phi': self.phi,
'p_scores': self.p_scores, 'w_scores': self.w_scores, 'z_scores': self.z_scores,
'likelihood': self.likelihood,
'p_true_val': self.p_true_val,
'p_true_index': self.p_true_index,
'w_true': self.w_true,
'phi_true': self.phi_true,
'pi_true': self.pi_true,
'z_true': self.z_true,
'z_true_per_data': self.z_true_per_data,
'data_for_debugging': self.data_for_debugging,
}, self.picklefile)
self.picklefile.close()
This looks to me like a pickled dict, which there are plenty of examples of online. All of the keys are in the pickle file, e.g. grep 'w_true' file.p
returns a match. But when I try to read the pickle with this code:
f = open('file.p')
p = pickle.load(f)
print "type=", type(p)
print "shape=", p.shape
print "w_true=", p.w_true
all I get is a single numpy.ndarray
:
type= <type 'numpy.ndarray'>
shape= (56, 147)
w_true=
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-83ddd085f0fd> in <module>()
3 print "type=", type(p)
4 print "shape=", p.shape
----> 5 print "w_true=", p.w_true
AttributeError: 'numpy.ndarray' object has no attribute 'w_true'
How do I access the other information in the pickle?
UPDATE: Yes, I'm definitely working with the same file. This modified code:
f = open('/path/to/file.p', mode='rt')
p = pickle.load(f)
print "type=", type(p)
print "shape=", p.shape
i = 0
for line in f:
if 'w_true' in line:
print "line no.", i, ":", line
i += 1
Results in:
type= <type 'numpy.ndarray'>
shape= (56, 147)
line no. 7499 : sS'w_true'
But there's no reason the string w_true
(and all of the other keys from the dict
) should be in the file if it was only an ndarray
, right? My only other thought is that maybe there's something in the pickle file header that's misleading the unpickler? I.e. maybe the output code is bugged? Here's the head of file.p
:
$ head -n25 file.p
cnumpy.core.multiarray
_reconstruct
p1
(cnumpy
ndarray
p2
(I0
tS'b'
tRp3
(I1
(I56
I147
tcnumpy
dtype
p4
(S'i4'
I0
I1
tRp5
(I3
S'<'
NNNI-1
I-1
I0
tbI00
Upvotes: 1
Views: 2172
Reputation: 685
The problem is that the code which created the pickle opened only one file:
self.picklefile = open(picklefile_name, 'wb')
then wrote to that same file twice. First here:
if self.is_saving_pickle:
pickle.dump(self.x, self.picklefile) #a single numpy.ndarray
and then later here:
def save_picklefile_and_close(self):
pickle.dump({'p_index': self.p_index, 'p_val': self.p_val,
# ... snip ...
'data_for_debugging': self.data_for_debugging,
}, self.picklefile)
self.picklefile.close()
When the unpickler
went to load file.p
, it saw and returned the first pickle
and then stopped before touching the second. When the first write pickle.dump()
call is commented out, unpickling
properly returns the dict
from the second call.
Update: based on @Mike McKerns' comment & links, it is also possible to call pickle.load()
multiple times to retrieve multiple pickles made by multiple pickle.dump()
calls within the same file.
Upvotes: 2