How to store series of time series data, including metadata

Question

I have some raw data, which contains of an array of (around 1e7) events, each of them consisting of some metadata (time of recording, channel number, etc.) and the actual time serie. By processing the data, I compute an array of (around 20) single valued features for each event. So I end up with the following data structure of single values and one array:

1: eventID, ToR, channel, feat1, feat2,..., signal(shape=30000,)
2: eventID, ToR, channel, feat1, feat2,..., signal
3: .
   .

Now what is the best way to structure this data in Python, if I want to:

Access and plot signals, indexed by its eventID
Slice through all the events (or parts of it) and plot feat1 vs. feat2
Access data columns by its name
Possibly access the eventID by providing two of its features
Possibly remove single events

While storing the data in a Python dictionary would surely be possible, I reckon there is a faster way. A pandas dataframe seems not to be possible to me because the data doesn't have equal lengths.

By takeing a numpy array of type object it is possible to store arrays in arrays like [1, 2, 3, [4,5,6]], but I lose the ability to access the data by its name, which would be at least not preferred.

I might not have the right feelings for this kind of structure, so what are appropriate approaches for this?

John Zwinck · Accepted Answer

I'd store all the metadata in a single DataFrame:

1: eventID, ToR, channel, feat1, feat2,...
2: eventID, ToR, channel, feat1, feat2,...
3: ...

Then for the time series which have different lengths, I would store each time series in a pd.Series(), in a dict keyed the same as the metadata index (or maybe by eventID), or a list (row N of time series maps to row N of metadata).

How to store series of time series data, including metadata

Answers (1)

Related Questions