Reputation: 197
I have a data similar to the one below:
datetime type value
2021-08-13 23:46:00 type1 10
2021-08-13 23:47:00 type1 20
2021-08-13 23:48:00 type1 40
2021-08-13 23:48:00 type2 5
2021-08-13 23:49:00 type2 15
2021-08-13 23:50:00 type2 20
2021-08-13 23:50:00 type1 25
2021-08-13 23:51:00 type1 35
2021-08-13 23:51:00 type1 40
As you can see there can be rows with same date index but with different type. The type field can only be type1 or type2.
I need to get differences between values for each type, diff() function needs to calculate differences of value between same types. Also the first value of a type will become 0 or NaN, since there are no previous value.
The important thing here: the type changes at some points in data at the same time interval, and sometimes when it does the value becomes lesser than the previous value of the same type. So when something like this happens the .diff() function results in a negative value.
What I'm trying to achieve: when the result of .diff() becomes negative, the value of that row must be set to 0 or NaN and in the next row the value must be equals to
original/previous value of the previous row - current value
I tried using diff with groupby, iterating dataframe etc. but couldn't manage to do it.
To give an example, I want the dataframe to be like this:
datetime type value
2021-08-13 23:46:00 type1 NaN
2021-08-13 23:47:00 type1 10
2021-08-13 23:48:00 type1 20
2021-08-13 23:48:00 type2 NaN
2021-08-13 23:49:00 type2 10
2021-08-13 23:50:00 type2 5
2021-08-13 23:50:00 type1 NaN
2021-08-13 23:51:00 type1 10
2021-08-13 23:51:00 type1 5
Edit: When I use:
df['value'] = df.groupby('type')['value'].diff()
The data becomes like this:
datetime type value
2021-08-13 23:46:00 type1 NaN
2021-08-13 23:47:00 type1 10
2021-08-13 23:48:00 type1 20
2021-08-13 23:48:00 type2 NaN
2021-08-13 23:49:00 type2 10
2021-08-13 23:50:00 type2 5
2021-08-13 23:50:00 type1 -15
2021-08-13 23:51:00 type1 10
2021-08-13 23:51:00 type1 5
The value in last 7th row becomes very different what I'm trying to achieve.
Upvotes: 0
Views: 1403
Reputation: 260790
You can group by 'type' and apply diff
:
df['value'] = df.groupby('type')['value'].diff()
output:
datetime type value
0 2021-08-13 23:46:00 type1 NaN
1 2021-08-13 23:47:00 type1 10.0
2 2021-08-13 23:48:00 type1 20.0
3 2021-08-13 23:48:00 type2 NaN
4 2021-08-13 23:49:00 type2 10.0
To have only positive values, NaN otherwise:
df['value'] = df.groupby('type')['value'].diff().where(lambda x: x>0, np.NaN)
output on new dataset:
datetime type value
0 2021-08-13 23:46:00 type1 NaN
1 2021-08-13 23:47:00 type1 10.0
2 2021-08-13 23:48:00 type1 20.0
3 2021-08-13 23:48:00 type2 NaN
4 2021-08-13 23:49:00 type2 10.0
5 2021-08-13 23:50:00 type2 5.0
6 2021-08-13 23:50:00 type1 NaN
7 2021-08-13 23:51:00 type1 10.0
And to replace negative values with 0:
df['value'] = df.groupby('type')['value'].diff().clip(lower=0)
Upvotes: 2