Reputation: 301
I save a lot of offline models/matrices/array in Python and came across these functions. Can somebody help me by listing pros and cons of numpy.save( ) and joblib.dump( )?
Upvotes: 3
Views: 2420
Reputation: 35247
Here's the critical sections of code from joblib
that should shed some light.
def _write_array(self, array, filename):
if not self.compress:
self.np.save(filename, array)
container = NDArrayWrapper(os.path.basename(filename),
type(array))
else:
filename += '.z'
# Efficient compressed storage:
# The meta data is stored in the container, and the core
# numerics in a z-file
_, init_args, state = array.__reduce__()
# the last entry of 'state' is the data itself
zfile = open(filename, 'wb')
write_zfile(zfile, state[-1],
compress=self.compress)
zfile.close()
state = state[:-1]
container = ZNDArrayWrapper(os.path.basename(filename),
init_args, state)
return container, filename
Basically, joblib.dump
can optionally compress an array, which it either stores to disk with numpy.save
, or (for compression) stores a zip-file. Also, joblib.dump
stores a NDArrayWrapper
(or ZNDArrayWrapper
for compression), which is a lightweight object that stores the name of the save/zip file with the array contents, and the subclass of the array.
Upvotes: 3