Reputation: 942
I would like to delete an element from an HDF5 dataset in Python. Below I have my example code
DeleteHDF5Dataset.py
# This code works, which deletes an HDF5 dataset from an HDF5 file
file_name = os.path.join('myfilepath', 'myfilename.hdf5')
f = h5py.File(file_name, 'r+')
f.__delitem__('Log list')
However, this is not what I want to do. 'mydatatset' is an HDF5 dataset that has several elements, and I would like to delete one or more of the elements individually, for example
DeleteHDF5DatasetElement.py
# This code does not work, but I would like to achieve what it's trying to do
file_name = os.path.join('myfilepath', 'myfilename.hdf5')
f = h5py.File(file_name, 'r+')
print(f['Log list'][3]) # prints the correct dataset element
f.__delitem__('Log list')[3] # I want to delete element 3 of this HDF5 dataset
The best solution I can come up with is to create a temporary dataset, loop through the original dataset, and only add the entries I want to keep to the temp dataset, and then replace the old dataset with the new one. But this seems pretty clunky. Does anybody have a clean solution to do this? It seems like there should be a simple way to just delete an element.
Thanks, and sorry if any of my terminology is incorrect.
Upvotes: 2
Views: 4452
Reputation: 164623
It looks like you have an array of strings. It's not the recommended way of storing strings in HDF5, but let's assume you have no choice on how data is stored.
HDF5 prefers you to keep your array size fixed. Operations such as deleting arbitrary elements are expensive. In addition, with HDF5, space is not automatically freed when you delete data.
After all this, if you still want to remove data in your specified format, you can try simply extracting an array, deleting an element, then reassigning to your dataset:
arr = f['Log list'][:] # extract to numpy array
res = np.delete(arr, 1) # delete element with index 1, i.e. second element
f.__delitem__('Log list') # delete existing dataset
f['Log list'] = res # reassign to dataset
Upvotes: 1