Storing datastream in hdf5 file using python

Question

I have a python program that accepts a stream of data via UDP at a rate of +- 1000 Hz. A typical stream takes +- 15 mins. It consists of +- 10 channels each consisting of a stream of doubles, booleans or vector of size 3 with a timestamp.

Currently every iteration (so 1000 times a second) it writes a line to a csv file with all the values.

To limit the size of the files I want to change the format to hdf5 and write the data with h5py.

So very short it should look like this:

class StoreData(threading.Thread):

    def __init__(self):
        super().__init__()
        self.f = open_hdf5_file_as_write()

    def run(self):
        while True:
            # return True every +- 0.001 seconds
            if self.new_values_available():
                vals = self.get_new_vals()
                # What to do best with the vals here?

But I stumble upon 2 questions.

What is the best structure of the HDF5 file? Is it best to store the streams in different groups, or just different datasets in the same group?
How should I write the data? Do I expand every iteration the datasets with 1 variable using a resize? Do I locally store data and update every n iterations with a chunk of n values per stream or do I keep everything in a pandas table and write it just once at the end?

Answering 1 of the 2 questions would already be a big help!

Storing datastream in hdf5 file using python

Answers (1)

Related Questions