Reputation: 1319
I have a dataset, which records temperatures of devices:
Device_ID Temperature Label
1 50 0
1 40 0
2 34 0
1 78 0
2 66 1
3 88 0
1 12 0
2 33 0
3 91 0
Devices can send temperature at any time. The above dataset is already sorted by time.
What I want to do?
Classify devices based on their temperature pattern.
I already have the labels for the dataset, labels indicate whether the devices went bad after a measurement (1 means that after this measurement the device went bad, 0 means its okay). Once the label goes "bad", the device is now permanently bad and I need not consider the following measurements for that id.
So, I want to construct the classification dataset as:
Device_ID T1 T2 T3 T4 ... Tn Label
1 50 40 78 ... 0
2 34 66 ..... 1 (33 measurement now does not matter)
So, my question is how do I convert these time series values in the rows to columns in Pandas? How to do this task efficiently?
I can use Pivot, but Pivot works only for Categorical variables.
Any help is greatly appreciated.
Upvotes: 4
Views: 2323
Reputation: 18647
Here is one way to do it.
Create a new helper column col
using groupby.cumcount
and a helper series label
of max Label for each 'Device_ID'. Then use DataFrame.pivot
and assign your helper series:
df['col'] = 'T' + (df.groupby('Device_ID').Temperature.cumcount() + 1).astype(str)
labels = df.groupby('Device_ID')['Label'].max()
df_new = df.pivot(index='Device_ID', columns='col', values='Temperature').assign(Label=labels)
print(df_new)
col T1 T2 T3 T4 Label
Device_ID
1 50.0 40.0 78.0 12.0 0
2 34.0 66.0 33.0 NaN 1
3 88.0 91.0 NaN NaN 0
Upvotes: 2