Reputation: 105
I recently started to use tsfresh
library to extract features from time-series data.
It's very cool that I can get the bag of features in few lines of code but I have doubt about the logic behind the select_features
method. I looked into the official documents and googled it, but I couldn't find which algorithm is used for this. I want to know how it works, so that I can decide what to do on the feature selection phase after data processing in tsfresh
.
Upvotes: 1
Views: 763
Reputation: 91
According to that page in their documentation, what they do is:
The references they provide should be of interest:
[1] Christ, M., Kempa-Liehr, A.W. and Feindt, M. (2016). Distributed and parallel time series feature extraction for industrial big data applications. ArXiv e-prints: 1610.07717 URL: http://adsabs.harvard.edu/abs/2016arXiv161007717C
[2] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188
where [1] is the paper describing tsfresh
and [2] is the reference for the multiple testing procedure (called Benjamini-Yekutieli procedure above).
Upvotes: 2