Reputation: 78
I have some raw data, which contains of an array of (around 1e7) events, each of them consisting of some metadata (time of recording, channel number, etc.) and the actual time serie. By processing the data, I compute an array of (around 20) single valued features for each event. So I end up with the following data structure of single values and one array:
1: eventID, ToR, channel, feat1, feat2,..., signal(shape=30000,)
2: eventID, ToR, channel, feat1, feat2,..., signal
3: .
.
Now what is the best way to structure this data in Python, if I want to:
While storing the data in a Python dictionary would surely be possible, I reckon there is a faster way. A pandas dataframe seems not to be possible to me because the data doesn't have equal lengths.
By takeing a numpy array of type object
it is possible to store arrays in arrays like [1, 2, 3, [4,5,6]]
, but I lose the ability to access the data by its name, which would be at least not preferred.
I might not have the right feelings for this kind of structure, so what are appropriate approaches for this?
Upvotes: 1
Views: 1190
Reputation: 249123
I'd store all the metadata in a single DataFrame:
1: eventID, ToR, channel, feat1, feat2,...
2: eventID, ToR, channel, feat1, feat2,...
3: ...
Then for the time series which have different lengths, I would store each time series in a pd.Series()
, in a dict keyed the same as the metadata index (or maybe by eventID
), or a list (row N of time series maps to row N of metadata).
Upvotes: 1