Reputation: 8413
tsfresh
needs input data in a specific column. I initially assumed that column_id
is just row_index but I fear it's wrong.
I have sensor data - pressure sensor, temperature sensor and humidity sensor being captured at 10 sec interval. Thus it's 4 column pandas DataFrame
. Now tell me how shuld the data be used like ? What is column id
?
The documentation is good here but just that I'm not able to understand what they mean by entity
. Each sensor measures a distinct thing and all are installed in a machine unit.
Upvotes: 3
Views: 2072
Reputation: 869
This column indicates which entities the time series belong to. Features will be extracted individually for each entity. The resulting feature matrix will contain one row per entity. In the example proposed in the documentation, you have values for 6 sensors of different robots at different times. In this example, each robot is a different entity, so each of it has a different id.
Or if you have data of different vendors and the number of items they sell in different categories at different time stamps the vendor id can be used as your "column_id".
Upvotes: 0
Reputation: 36028
The source code sheds some light on this ciphertext:
tsfresh/feature_extraction/extraction.py:76
:
:param column_id: The name of the id column to group by.
:type column_id: str
So, this is a column that should have the same value for all points of a time series. If there are multiple values in this column in the dataframe, the lib will interpret it as multiple time series and analyze them all at the same time.
Upvotes: 1