Basj
Basj

Reputation: 46463

Save compressed numpy array one after another (without having everything in RAM)

We can save many arrays, one after another, without having all of them in RAM at the same time with:

with open('test.npy', 'wb') as f:
    A = compute_my_np_array(1)
    np.save(f, A)
    # we could even do: del A  (but not needed here, because it is freed anyway in the next line)    

    A = compute_my_np_array(2)
    np.save(f, A)

but it is uncompressed. For compressed save, we have to have all arrays available at the same time, see numpy.savez_compressed:

A = compute_my_np_array(1)
B = compute_my_np_array(2)
np.savez_compressed('test.npz', A=A, B=B)

TL;DR: how to save compressed numpy arrays without having all of them in RAM at the same time?

Upvotes: 1

Views: 582

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50358

On solution is to use packages like gzip so to open the file as a gzip stream instead of a raw binary file. Here is an example:

import gzip

with gzip.open('test.npy.gz', 'wb') as f:
    A = compute_my_np_array(1)
    np.save(f, A)

The result is a npy file compressed with gzip. You need also to open it with gzip so to read it (with np.load for example).

Note that the gzip compression is a bit slow for large data although the compression ratio is relatively good. Other compression standard may better fit your needs. For example the Zstd (faster) and the LZ4 (much faster) could provide a faster compression certainly at the expense of a lower compression ratio (there is no free lunch).

Upvotes: 1

Related Questions