Reputation: 451
I have an HDF5 file with 20 datasets, each with 200 rows of compound dtype ('<r4', '<r4', '<i4')
where each component of the dtype represents a 1-D variable. I am finding that it takes about 2 seconds to open each file and assign component of the column to its own variable, which seems remarkably slow to me. I'm using h5py and numpy to open and read from the file into numpy arrays:
import numpy as np
import h5py
...
f = h5py.File("foo.hdf5", "r")
set1 = f["foo/bar"]
var1 = np.asarray([row[0] for row in set1])
var2 = np.asarray([row[1] for row in set1])
var3 = np.asarray([row[2] for row in set1])
Is there a faster way to extract the variables from these datasets?
Here is a screenshot of one of the datasets using hdfview:
Upvotes: 2
Views: 2862
Reputation: 451
A much faster way (~0.05 seconds) is to transform the dataset into an array and then reference the fields by name:
import numpy as np
import h5py
...
f = h5py.File("foo.hdf5", "r")
set1 = np.asarray(f["foo/bar"])
var1 = set1["var1"]
var2 = set1["var2"]
var3 = set1["var3"]
Upvotes: 3