Reputation: 1971
The python documentation for the numpy.savez
which saves an .npz
file is:
The .npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed and each file in the archive contains one variable in .npy format. [...]
When opening the saved .npz file with load a NpzFile object is returned. This is a dictionary-like object which can be queried for its list of arrays (with the .files attribute), and for the arrays themselves.
My question is: what is the point of numpy.savez
?
Is it just a more elegant version (shorter command) to save multiple arrays, or is there a speed-up in the saving/reading process? Does it occupy less memory?
Upvotes: 50
Views: 56078
Reputation: 17478
There are two parts of explanation for answering your question.
As we already read from the doc, the .npy
format is:
the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. ... The format is designed to be as simple as possible while achieving its limited goals. (sources)
And .npz
is only a
simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “
.npy
” files. We recommend using the file extension “.npz
” for these archives. (sources)
So, .npz
is just a ZipFile containing multiple “.npy
” files. And this ZipFile can be either compressed (by using np.savez_compressed
) or uncompressed (by using np.savez
).
It's similar to tarball archive file in Unix-like system, where a tarball file can be just an uncompressed archive file which containing other files or a compressed archive file by combining with various compression programs (gzip
, bzip2
, etc.)
And Numpy also provides different APIs to produce these binary file output:
np.save
---> Save an array to a binary file in NumPy .npy
formatnp.savez
--> Save several arrays into a single file in uncompressed .npz
formatnp.savez_compressed
--> Save several arrays into a single file in compressed .npz
formatnp.load
--> Load arrays or pickled objects from .npy
, .npz
or pickled filesIf we skim the source code of Numpy, under the hood:
def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
...
if compress:
compression = zipfile.ZIP_DEFLATED
else:
compression = zipfile.ZIP_STORED
...
def savez(file, *args, **kwds):
_savez(file, args, kwds, False)
def savez_compressed(file, *args, **kwds):
_savez(file, args, kwds, True)
Then back to the question:
np.save
, there is no more compression on top of the .npy
format, only just a single archive file for the convenience of managing multiple related files.np.savez_compressed
, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).Upvotes: 68
Reputation: 3147
The main advantage is that the arrays are lazy loaded. That is, if you have an npz
file with 100 arrays you can load the file without actually loading any of the data. If you request a single array, only the data for that array is loaded.
A downside to npz
files is they can't be memory mapped (using load(<file>, mmap_mode='r')
), so for large arrays they may not be the best choice. For data where the arrays have a common shape I'd suggest taking a look at structured arrays. These can be memory mapped, allow accessing data with dict-like syntax (i.e., arr['field']
), and are very efficient memory wise.
Upvotes: 11