Reputation: 3392
I am trying to delete a subgroup that I've wrote in a HDF5 file using h5py in Python. For example, according to the documentation, the subgroup called "MyDataset" can be deleted with:
del subgroup["MyDataset"]
I did it and effectively the subgroup is not longer accessible. However, the files does not reduce its size. My question, is it possible to recover the space from deleted subgroups using h5py without having to rewrite the remaining subgroups into a completely new file? Below I provide a small example that illustrate what I am saying:
import numpy as np
import h5py
myfile = h5py.File('file1.hdf5')
data = np.random.rand(int(1e6))
myfile.create_dataset("MyDataSet", data=data)
myfile.close()
Then I open the file and remove the previous entry:
myfile = h5py.File('file1.hdf5')
del myfile["MyDataSet"]
and if you try to get the data using:
myfile["MyDataSet"].value
you will realize that the data is not longer accessible. However, if you check the size of the file it remains constant before and after calling to del.
Upvotes: 2
Views: 4183
Reputation: 879591
del myfile["MyDataSet"]
modifies the File
object, but does not modify the underlying file1.hdf5
file. The file1.hdf5
file not modified until myfile.close()
is called.
If you use a with-statement
, myfile.close()
will be called automatically for you when Python leaves the with-statement
:
import numpy as np
import h5py
import os
path = 'file1.hdf5'
with h5py.File(path, "w") as myfile:
data = np.random.rand(int(1e6))
myfile.create_dataset("MyDataSet", data=data)
print(os.path.getsize(path))
with h5py.File(path, "a") as myfile:
del myfile["MyDataSet"]
try:
myfile["MyDataSet"].value
except KeyError as err:
# print(err)
pass
print(os.path.getsize(path))
prints
8002144 <-- original file size
2144 <-- new file size
Notice that the first time, opening the File
in write mode ("w"
) creates a new file, the second time, opening the File
in append mode ("a"
, the default) allows reading the existant file and modifying it.
Upvotes: 4