Salil Navgire
Salil Navgire

Reputation: 301

What is the difference between numpy.save( ) and joblib.dump( ) in Python?

I save a lot of offline models/matrices/array in Python and came across these functions. Can somebody help me by listing pros and cons of numpy.save( ) and joblib.dump( )?

Upvotes: 3

Views: 2420

Answers (1)

Mike McKerns
Mike McKerns

Reputation: 35247

Here's the critical sections of code from joblib that should shed some light.

def _write_array(self, array, filename):
    if not self.compress:
        self.np.save(filename, array)
        container = NDArrayWrapper(os.path.basename(filename),
                                   type(array))
    else:
        filename += '.z'
        # Efficient compressed storage:
        # The meta data is stored in the container, and the core
        # numerics in a z-file
        _, init_args, state = array.__reduce__()
        # the last entry of 'state' is the data itself
        zfile = open(filename, 'wb')
        write_zfile(zfile, state[-1],
                            compress=self.compress)
        zfile.close()
        state = state[:-1]
        container = ZNDArrayWrapper(os.path.basename(filename),
                                        init_args, state)
    return container, filename

Basically, joblib.dump can optionally compress an array, which it either stores to disk with numpy.save, or (for compression) stores a zip-file. Also, joblib.dump stores a NDArrayWrapper (or ZNDArrayWrapper for compression), which is a lightweight object that stores the name of the save/zip file with the array contents, and the subclass of the array.

Upvotes: 3

Related Questions