Reputation: 2712
I have a numpy
array which saved as an uncompressed '*npz' file is about 26 GiB as it is numpy.float32
and numpy.savez()
ends with:
OSError: Failed to write to /tmp/tmpl9v3xsmf-numpy.npy: 6998400000 requested and 3456146404 written
I suppose saving it compressed may save the day, but with numpy.savez_compressed()
I have also:
OSError: Failed to write to /tmp/tmp591cum2r-numpy.npy: 6998400000 requested and 3456157668 written
as numpy.savez_compressed()
saves the array uncompressed first.
The obvious "use additional storage" I do not consider an answer. ;)
[EDIT]
The tag low-memory
refers to disk memory, not RAM.
Upvotes: 0
Views: 5359
Reputation: 295363
With the addition of ZipFile.open(..., mode='w')
in Python 3.6, you can do better:
import numpy as np
import zipfile
import io
def saveCompressed(fh, **namedict):
with zipfile.ZipFile(fh, mode="w", compression=zipfile.ZIP_DEFLATED,
allowZip64=True) as zf:
for k, v in namedict.items():
with zf.open(k + '.npy', 'w', force_zip64=True) as buf:
np.lib.npyio.format.write_array(buf,
np.asanyarray(v),
allow_pickle=False)
Upvotes: 1
Reputation: 2712
Note: I would be more than happy to accept a more RAM-efficient solution.
I have browsed the numpy.savez_compressed()
code and decided to reimplement part of its functionality:
import numpy as np
import zipfile
import io
def saveCompressed(fh, **namedict):
with zipfile.ZipFile(fh,
mode="w",
compression=zipfile.ZIP_DEFLATED,
allowZip64=True) as zf:
for k, v in namedict.items():
buf = io.BytesIO()
np.lib.npyio.format.write_array(buf,
np.asanyarray(v),
allow_pickle=False)
zf.writestr(k + '.npy',
buf.getvalue())
It causes my system to swap, but at least I am able to store my data (sham data used in the example):
>>> A = np.ones(12 * 6 * 6 * 1 * 6 * 6 * 10000* 5* 9, dtype=np.float32)
>>> saveCompressed(open('test.npz', 'wb'), A=A)
>>> A = np.load('test.npz')['A']
>>> A.shape
(6998400000,)
>>> (A == 1).all()
True
Upvotes: 1