Alejandro
Alejandro

Reputation: 3392

Delete subgroup from HDF5 file in Python

I am trying to delete a subgroup that I've wrote in a HDF5 file using h5py in Python. For example, according to the documentation, the subgroup called "MyDataset" can be deleted with:

del subgroup["MyDataset"] 

I did it and effectively the subgroup is not longer accessible. However, the files does not reduce its size. My question, is it possible to recover the space from deleted subgroups using h5py without having to rewrite the remaining subgroups into a completely new file? Below I provide a small example that illustrate what I am saying:

import numpy as np
import h5py

myfile = h5py.File('file1.hdf5')
data = np.random.rand(int(1e6))
myfile.create_dataset("MyDataSet", data=data)
myfile.close()

Then I open the file and remove the previous entry:

myfile = h5py.File('file1.hdf5')
del myfile["MyDataSet"]

and if you try to get the data using:

myfile["MyDataSet"].value

you will realize that the data is not longer accessible. However, if you check the size of the file it remains constant before and after calling to del.

Upvotes: 2

Views: 4183

Answers (1)

unutbu
unutbu

Reputation: 879591

del myfile["MyDataSet"] modifies the File object, but does not modify the underlying file1.hdf5 file. The file1.hdf5 file not modified until myfile.close() is called.

If you use a with-statement, myfile.close() will be called automatically for you when Python leaves the with-statement:

import numpy as np
import h5py
import os

path = 'file1.hdf5'
with h5py.File(path, "w") as myfile:
    data = np.random.rand(int(1e6))
    myfile.create_dataset("MyDataSet", data=data)
    print(os.path.getsize(path))

with h5py.File(path, "a") as myfile:
    del myfile["MyDataSet"]
    try:
        myfile["MyDataSet"].value
    except KeyError as err:
        # print(err)
        pass

print(os.path.getsize(path))

prints

8002144         <-- original file size
2144            <-- new file size

Notice that the first time, opening the File in write mode ("w") creates a new file, the second time, opening the File in append mode ("a", the default) allows reading the existant file and modifying it.

Upvotes: 4

Related Questions