Reputation: 1934
I am using extract features method in tsfresh to extract features from a collection of time series. Basically, what I have is a dictionary of dataframes that look like this:
where, column id
is one value but different for each dataframe in the dictionary. I do the following:
exracted_features = extract_features(subsets, column_id='id', column_value = '#text', feature_extraction_settings=MinimalFeatureExtractionSettings())
Here, subsets is the dictionary containing the dataframes. I basically cut a big single time serie in different chunks and put them in a dictionary. Now I want to extract the features of each chunk to train an ML algorithm and classify parts of the series as 1 or 0.
However, I noted that extracted_features
contains a sparse matrix that has a size of 604 rows x 4832 columns. What it does is create columns for the 8 basic features per timeseries for each(!) row (median, minimum, sum_values, maximum, variance,standard_deviation, mean, length). Hence, it takes forever to do select_features
and it fails if I use dropna()
cuz I am left with an empty DataFrame. I don't understand why it creates a set of columns for each row and how can I prevent this from happening? (I want to use this on more features, but I have the same problem using different settings)
Upvotes: 0
Views: 695
Reputation: 547
I am the author of tsfresh. Are you using the latest version? How many chunks do you have? It could be that your dictionary is not properly constructed.
Upvotes: 1