Time series storage in HDF5 format

I want to store the results of time series (sensor data) into a HDF5 file. I cannot seem to be able to assign values to my dataset. Clearly, I am doing something wrong, I am just not sure what…

The code:

from datetime import datetime, timezone
import h5py

TIME_SERIES_FLOAT = np.dtype([("time", h5py.special_dtype(vlen=str)),
                              ("value", np.float)])

h5 = h5py.File('balh.h5', "w")
dset = create_dataset('data', (1, 2), chunks=True, maxshape=(None, 2), dtype=TIME_SERIES_FLOAT)
dset[0]['time'] = datetime.now(timezone.utc).astimezone().isoformat()
dset[0]['value'] = 0.0

Then the update code resizes the dataset and adds more values. Clearly doing that per value is inefficient:

size = list(dset.shape)
size[0] += 1
dset.resize(tuple(size))
dset[size[0]-1]['time'] = datetime.now(timezone.utc).astimezone().isoformat()
dset[size[0]-1]['value'] = value

A much better method would be to collate some data into an np.array and then add that every so often…

Is this sensible?…

Upvotes: 1

Views: 1909

Answers (1)

I need more coffee…

The defined type is a tuple containing a string (aka the time) and a float (aka the value) so to add one, I need:

dset[-1] = (datetime.now(timezone.utc).astimezone().isoformat(), value)

It is actually that simple!

Adding many entries is done this way:

l = [('stamp', x) for x in range(10)] 
size = list(dset.shape)
tmp = size[0]
size[0] += len(l)
dset.resize(tuple(size))
for x in range(len(l)):                                                
    dset[tmp+x] = l[x]

Nonetheless, this feels somewhat clunky and sub-optimal…

Upvotes: 1

Related Questions