DavidH
DavidH

Reputation: 451

What is a fast way in Python to read HDF5 compound dtype arrays?

I have an HDF5 file with 20 datasets, each with 200 rows of compound dtype ('<r4', '<r4', '<i4') where each component of the dtype represents a 1-D variable. I am finding that it takes about 2 seconds to open each file and assign component of the column to its own variable, which seems remarkably slow to me. I'm using h5py and numpy to open and read from the file into numpy arrays:

import numpy as np
import h5py
...
f = h5py.File("foo.hdf5", "r")
set1 = f["foo/bar"]
var1 = np.asarray([row[0] for row in set1])
var2 = np.asarray([row[1] for row in set1])
var3 = np.asarray([row[2] for row in set1])

Is there a faster way to extract the variables from these datasets?

Here is a screenshot of one of the datasets using hdfview: hdfview

Upvotes: 2

Views: 2862

Answers (1)

DavidH
DavidH

Reputation: 451

A much faster way (~0.05 seconds) is to transform the dataset into an array and then reference the fields by name:

import numpy as np
import h5py
...
f = h5py.File("foo.hdf5", "r")
set1 = np.asarray(f["foo/bar"])
var1 = set1["var1"]
var2 = set1["var2"]
var3 = set1["var3"]

Upvotes: 3

Related Questions