Abufari
Abufari

Reputation: 78

How to store series of time series data, including metadata

I have some raw data, which contains of an array of (around 1e7) events, each of them consisting of some metadata (time of recording, channel number, etc.) and the actual time serie. By processing the data, I compute an array of (around 20) single valued features for each event. So I end up with the following data structure of single values and one array:

1: eventID, ToR, channel, feat1, feat2,..., signal(shape=30000,)
2: eventID, ToR, channel, feat1, feat2,..., signal
3: .
   .

Now what is the best way to structure this data in Python, if I want to:

  1. Access and plot signals, indexed by its eventID
  2. Slice through all the events (or parts of it) and plot feat1 vs. feat2
  3. Access data columns by its name
  4. Possibly access the eventID by providing two of its features
  5. Possibly remove single events

While storing the data in a Python dictionary would surely be possible, I reckon there is a faster way. A pandas dataframe seems not to be possible to me because the data doesn't have equal lengths.

By takeing a numpy array of type object it is possible to store arrays in arrays like [1, 2, 3, [4,5,6]], but I lose the ability to access the data by its name, which would be at least not preferred.

I might not have the right feelings for this kind of structure, so what are appropriate approaches for this?

Upvotes: 1

Views: 1190

Answers (1)

John Zwinck
John Zwinck

Reputation: 249123

I'd store all the metadata in a single DataFrame:

1: eventID, ToR, channel, feat1, feat2,...
2: eventID, ToR, channel, feat1, feat2,...
3: ...

Then for the time series which have different lengths, I would store each time series in a pd.Series(), in a dict keyed the same as the metadata index (or maybe by eventID), or a list (row N of time series maps to row N of metadata).

Upvotes: 1

Related Questions