Reputation: 281
I am trying to resize dataset and store new values using h5py
package in python. My dataset size keeps increasing at every time instance, and I would like to append the .h5
file using the resize
function. However, I run into errors using my approach. The variable dset
is an array of datasets.
import os
import h5py
import numpy as np
path = './out.h5'
os.remove(path)
def create_h5py(path):
with h5py.File(path, "a") as hf:
grp = hf.create_group('left')
dset = []
dset.append(grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
dset.append(grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
return dset
if __name__ == '__main__':
dset = create_h5py(path)
for i in range(3):
if i == 0:
dset[0][:] = np.random.random(dset[0].shape)
dset[1][:] = np.random.random(dset[1].shape)
else:
dset[0].resize(dset[0].shape[0]+10**4, axis=0)
dset[0][-10**4:] = np.random.random((10**4,3))
dset[1].resize(dset[1].shape[0]+10**4, axis=0)
dset[1][-10**4:] = np.random.random((10**4,3))
EDIT
Thanks to tel I was able to solve this. Replace with h5py.File(path, "a") as hf:
with hf = h5py.File(path, "a")
.
Upvotes: 2
Views: 2725
Reputation: 8006
@tel provided an elegant solution to the problem. I outlined a simpler approach in my comments below his answer. It is simpler for a beginner to code (and understand). Basically, it there a few minor changes to @Maxtron's original code. Modifications are:
with h5py.File(path, "a") as hf:
to __main__
routinehf
in create_h5py(hf)
os.remove()
to avoid errors if the h5 file
doesn't existMy suggested modifications below:
import h5py, os
import numpy as np
path = './out.h5'
# test existence of H5 file before deleting
if os.path.isfile(path):
os.remove(path)
def create_h5py(hf):
grp = hf.create_group('left')
dset = []
dset.append(grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
dset.append(grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
return dset
if __name__ == '__main__':
with h5py.File(path, "a") as hf:
dset = create_h5py(hf)
for i in range(3):
if i == 0:
dset[0][:] = np.random.random(dset[0].shape)
dset[1][:] = np.random.random(dset[1].shape)
else:
dset[0].resize(dset[0].shape[0]+10**4, axis=0)
dset[0][-10**4:] = np.random.random((10**4,3))
dset[1].resize(dset[1].shape[0]+10**4, axis=0)
dset[1][-10**4:] = np.random.random((10**4,3))
Upvotes: 2
Reputation: 13999
Not sure about the rest of your code, but you can't use the context manager pattern (ie with h5py.File(foo) as bar:
) within a function that returns a dataset. As you point out in the comment under your question, this means that by the time you try to access the dataset the actual HDF5 file will have already closed. The dataset objects in h5py
are like live views into the file, so they require the file remain open in order to use them. Thus, you're getting errors.
It's a good idea to always interact with files within a managed context (ie within a with
clause). If your code throws an error, the context manager will (almost always) ensure that the file is closed. This helps avoid any potential losses of data resulting from a crash.
In your case, you can have your cake (encapsulate your dataset creation routines in a separate function) and eat it too (interact with the HDF5 file within a managed context) by writing your own context manager to look after the file for you.
It's actually pretty simple to code. Any Python object that implements the __enter__
and __exit__
methods is a valid context manager. Here's a complete working version:
import os
import h5py
import numpy as np
path = './out.h5'
try:
os.remove(path)
except OSError:
pass
class H5PYManager:
def __init__(self, path, method='a'):
self.hf = h5py.File(path, method)
def __enter__(self):
# when you call `with H5PYManager(foo) as bar`, the return of this method will be assigned to `bar`
return self.create_datasets()
def __exit__(self, type, value, traceback):
# this method gets called when you exit the `with` clause, including when an error is raised
self.hf.close()
def create_datasets(self):
grp = self.hf.create_group('left')
return [grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)),
grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))]
if __name__ == '__main__':
with H5PYManager(path) as dset:
for i in range(3):
if i == 0:
dset[0][:] = np.random.random(dset[0].shape)
dset[1][:] = np.random.random(dset[1].shape)
else:
dset[0].resize(dset[0].shape[0]+10**4, axis=0)
dset[0][-10**4:] = np.random.random((10**4,3))
dset[1].resize(dset[1].shape[0]+10**4, axis=0)
dset[1][-10**4:] = np.random.random((10**4,3))
Upvotes: 3