Reputation: 716
I'm working on a project which needs to save and load multiple:
Now I'd like to store all my data in a single file (or transparent data storage), but I'm not sure how to store tables correctly.
How should I save the table's axis labels, in a way keeping the data programming language - independent?
Upvotes: 4
Views: 971
Reputation: 164623
Your question is broad, but I will try and dispel some myths to get your started. I only have experience with Python, so my examples will only relate to using HDF5 with Python.
Pandas or PyTables can access HDF5 files, but they do not allow to store plain NumPy arrays I think.
You are correct in that PyTables doesn't let you save a plain NumPy array without any additional overhead. But you don't need to use PyTables. h5py
offers a NumPy-like interface to storing and accessing arrays in / from HDF5 files.
Store a NumPy array
import h5py, numpy as np
arr = np.random.randint(0, 10, (1000, 1000))
f = h5py.File('file.h5', 'w', libver='latest') # use 'latest' for performance
dset = f.create_dataset('array', shape=(1000, 1000), data=arr, chunks=(100, 100)
compression='gzip', compression_opts=9)
There are compression and chunking options which you can explore further to optimise read/write performance and compression ratios, according to your requirements. Note, however, that gzip
is one of the few compression filters which ship with all HDF5 installations.
Store axis labels as attributes
Attributes are similar to datasets and allow you to store a wide range of data, including scalars or arrays.
dset.attrs['Description'] = 'Some text snippet'
dset.attrs['X-Labels'] = np.arange(1000)
dset.attrs['Y-Labels'] = np.arange(1000)
Internally, data is not stored as NumPy arrays, but in data-type sensitive contiguous memory blocks according to the HDF5 specification. As such, you will be able to read these files from any HDF5 API.
It is worth noting that there are specific requirements to ensure strings are transportable, see Strings in HDF5 from the h5py
docs for more details.
Upvotes: 3