The algorithm behind tsfresh select_features method

Question

I recently started to use tsfresh library to extract features from time-series data.

It's very cool that I can get the bag of features in few lines of code but I have doubt about the logic behind the select_features method. I looked into the official documents and googled it, but I couldn't find which algorithm is used for this. I want to know how it works, so that I can decide what to do on the feature selection phase after data processing in tsfresh.

rtavenar · Accepted Answer

According to that page in their documentation, what they do is:

they extract a whole set of features
they individually test the different features for significance (in a supervised setting, so the test is something like "is this feature useful to predict that output?") and keep the most significant ones using a procedure called the Benjamini-Yekutieli procedure

The references they provide should be of interest:

[1] Christ, M., Kempa-Liehr, A.W. and Feindt, M. (2016). Distributed and parallel time series feature extraction for industrial big data applications. ArXiv e-prints: 1610.07717 URL: http://adsabs.harvard.edu/abs/2016arXiv161007717C

[2] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188

where [1] is the paper describing tsfresh and [2] is the reference for the multiple testing procedure (called Benjamini-Yekutieli procedure above).

The algorithm behind tsfresh select_features method

Answers (1)

Related Questions