Reputation: 5975
Do I understand correctly that HDF5-files should be manually closed like this:
import h5py
file = h5py.File('test.h5', 'r')
...
file.close()
From the documentation: "HDF5 files work generally like standard Python file objects. They support standard modes like r/w/a, and should be closed when they are no longer in use.".
But I wonder: will the garbage collection evoke file.close()
when the script terminates or when file
is overwritten?
Upvotes: 2
Views: 3543
Reputation: 468
This was answered in the comments a long time ago by @kcw78, but I thought I might as well write it up as a quick answer for anyone else reaching this.
As @kcw78 says, you should explicitly close files when you are done with them by calling file.close()
. From previous experience, I can tell you that h5py files are usually closed properly anyway when the script terminates, but occasionally the files would be corrupt (although I'm not sure if that ever happens when in 'r' mode only). Better not to leave it to chance!
As @kcw78 also suggests, using a context manager is a good way to go if you want to be safe. In either case, you need to be careful to actually extract the data you want before letting the file close.
e.g.
import h5py
with h5py.File('test.h5', 'w') as f:
f['data'] = [1,2,3]
# Letting the file close and reopening in read only mode for example purposes
with h5py.File('test.h5', 'r') as f:
dataset = f.get('data') # get the h5py.Dataset
data = dataset[:] # Copy the array into memory
print(dataset.shape, data.shape) # appear to behave the same
print(dataset[0], data[0]) # appear to behave the same
print(data[0], data.shape) # Works same as above
print(dataset[0], dataset.shape) # Raises ValueError: Not a dataset
dataset[0]
raises an error here because dataset
was an instance of h5py.Dataset which was associated with f
and was closed at the same time f
was closed. Whereas data
is just a numpy
array containing only the data part of the dataset (i.e. no additional attributes).
Upvotes: 2