k88
k88

Reputation: 1934

(Python) tsfresh extracts_features creates set of columns resulting in huge dataframe with lots of NaN

I am using extract features method in tsfresh to extract features from a collection of time series. Basically, what I have is a dictionary of dataframes that look like this:

dataframe,

where, column idis one value but different for each dataframe in the dictionary. I do the following:

exracted_features = extract_features(subsets, column_id='id', column_value = '#text', feature_extraction_settings=MinimalFeatureExtractionSettings())

Here, subsets is the dictionary containing the dataframes. I basically cut a big single time serie in different chunks and put them in a dictionary. Now I want to extract the features of each chunk to train an ML algorithm and classify parts of the series as 1 or 0.

However, I noted that extracted_features contains a sparse matrix that has a size of 604 rows x 4832 columns. What it does is create columns for the 8 basic features per timeseries for each(!) row (median, minimum, sum_values, maximum, variance,standard_deviation, mean, length). Hence, it takes forever to do select_features and it fails if I use dropna() cuz I am left with an empty DataFrame. I don't understand why it creates a set of columns for each row and how can I prevent this from happening? (I want to use this on more features, but I have the same problem using different settings)

Upvotes: 0

Views: 695

Answers (1)

MaxBenChrist
MaxBenChrist

Reputation: 547

I am the author of tsfresh. Are you using the latest version? How many chunks do you have? It could be that your dictionary is not properly constructed.

Upvotes: 1

Related Questions