Reputation: 46503
This works to write and load a numpy array + metadata in a .npz
compressed file (here the compression is useless because it's random, but anyway):
import numpy as np
# save
D = {"x": np.random.random((10000, 1000)), "metadata": {"date": "20221123", "user": "bob", "name": "abc"}}
with open("test.npz", "wb") as f:
np.savez_compressed(f, **D)
# load
D2 = np.load("test.npz", allow_pickle=True)
print(D2["x"])
print(D2["metadata"].item()["date"])
Let's say we want to change only a metadata:
D["metadata"]["name"] = "xyz"
Is there a way to re-write to disk in test.npz
only D["metadata"]
and not the whole file because D["x"]
has not changed?
In my case, the .npz file can be 100 MB to 4 GB large, that's why it would be interesting to rewrite only the metadata.
Upvotes: 3
Views: 448
Reputation: 4171
Ultimately the solution that I could get to work (thus far) is the one I originally thought of with zipfile
.
import zipfile
import os
from contextlib import contextmanager
@contextmanager
def archive_manager(archive_name: str, key: str):
f, s = zipfile.ZipFile(archive_name, "a"), f"{key}.npy"
yield s
f.write(s)
f.close()
os.remove(s)
Let's say we want to change metadata
:
new_metadata = {"date": "20221123", "user": "bob", "name": "xyz"}
with archive_manager("test.npz", "metadata") as archive:
np.save(archive, new_metadata)
np.load
returns an NpzFile
, which is a lazy loader. However, NpzFile
objects aren't directly writeable. We cannot also do something like D["metadata"] = new_metadata
until D
has been converted to a dict, and that loses the lazy functionality.
Upvotes: 2