Reputation: 5155
I would like to create a column ['diff]
which is the difference of adjacent Status
values to each individual group of ID
independent of other groups.
df
:
ID Timestamp Value Status
103177 64 2010-09-21 23:13:21.090 21.5 1.0
252019 64 2010-09-22 00:44:14.890 21.5 1.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0
18870 32 2010-09-22 00:44:19.890 24.5 1.0
205910 32 2010-09-22 00:44:23.440 24.5 1.0
103865 32 2010-09-22 01:04:33.440 23.5 0.0
152281 32 2010-09-22 01:27:01.790 22.5 1.0
138988 32 2010-09-22 02:18:52.850 21.5 0.0
reproducible example:
from pandas import *
from numpy import nan
df = pd.DataFrame({'ID': {103177: 64,
252019: 64,
271381: 64,
268939: 64,
259875: 64,
18870: 32,
205910: 32,
103865: 32,
152281: 32,
138988: 32},
'Timestamp': {103177: Timestamp('2010-09-21 23:13:21.090000'),
252019: Timestamp('2010-09-22 00:44:14.890000'),
271381: Timestamp('2010-09-22 00:44:15.890000'),
268939: Timestamp('2010-09-22 00:44:17.890000'),
259875: Timestamp('2010-09-22 00:44:18.440000'),
18870: Timestamp('2010-09-22 00:44:19.890000'),
205910: Timestamp('2010-09-22 00:44:23.440000'),
103865: Timestamp('2010-09-22 01:04:33.440000'),
152281: Timestamp('2010-09-22 01:27:01.790000'),
138988: Timestamp('2010-09-22 02:18:52.850000')},
'Value': {103177: 21.5,
252019: 21.5,
271381: 21.5,
268939: 23.0,
259875: 23.0,
18870: 24.5,
205910: 24.5,
103865: 23.5,
152281: 22.5,
138988: 21.5},
'Status': {103177: 1.0,
252019: 1.0,
271381: 0.0,
268939: 0.0,
259875: 1.0,
18870: 1.0,
205910: 1.0,
103865: 0.0,
152281: 1.0,
138988: 0.0}})
df
Expected output:
ID Timestamp Value Status Diff
103177 64 2010-09-21 23:13:21.090 21.5 1.0 NaN
252019 64 2010-09-22 00:44:14.890 21.5 1.0 0.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0 -1.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0 1.0
18870 64 2010-09-22 00:44:19.890 24.5 1.0 0.0
205910 32 2010-09-22 00:44:23.440 24.5 1.0 NaN
103865 32 2010-09-22 01:04:33.440 23.5 0.0 -1.0
152281 32 2010-09-22 01:27:01.790 22.5 1.0 1.0
138988 32 2010-09-22 02:18:52.850 21.5 0.0 -1.0
How can I do this?
Upvotes: 1
Views: 37
Reputation: 16683
You are looking for groupby + .diff
:
df['Diff'] = df.groupby('ID')['Status'].diff()
df
Out[1]:
ID Timestamp Value Status Diff
103177 64 2010-09-21 23:13:21.090 21.5 1.0 NaN
252019 64 2010-09-22 00:44:14.890 21.5 1.0 0.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0 -1.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0 1.0
18870 32 2010-09-22 00:44:19.890 24.5 1.0 NaN
205910 32 2010-09-22 00:44:23.440 24.5 1.0 0.0
103865 32 2010-09-22 01:04:33.440 23.5 0.0 -1.0
152281 32 2010-09-22 01:27:01.790 22.5 1.0 1.0
138988 32 2010-09-22 02:18:52.850 21.5 0.0 -1.0
Upvotes: 2
Reputation: 195633
df['Diff'] = df.groupby('ID')['Status'].transform('diff')
print(df)
Prints:
ID Timestamp Value Status Diff
103177 64 2010-09-21 23:13:21.090 21.5 1.0 NaN
252019 64 2010-09-22 00:44:14.890 21.5 1.0 0.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0 -1.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0 1.0
18870 32 2010-09-22 00:44:19.890 24.5 1.0 NaN
205910 32 2010-09-22 00:44:23.440 24.5 1.0 0.0
103865 32 2010-09-22 01:04:33.440 23.5 0.0 -1.0
152281 32 2010-09-22 01:27:01.790 22.5 1.0 1.0
138988 32 2010-09-22 02:18:52.850 21.5 0.0 -1.0
Upvotes: 1