bhdrozgn
bhdrozgn

Reputation: 197

Pandas .diff() with same index and with condition

I have a data similar to the one below:

datetime                type                value
2021-08-13 23:46:00     type1               10
2021-08-13 23:47:00     type1               20
2021-08-13 23:48:00     type1               40
2021-08-13 23:48:00     type2               5
2021-08-13 23:49:00     type2               15
2021-08-13 23:50:00     type2               20
2021-08-13 23:50:00     type1               25
2021-08-13 23:51:00     type1               35
2021-08-13 23:51:00     type1               40

As you can see there can be rows with same date index but with different type. The type field can only be type1 or type2.

I need to get differences between values for each type, diff() function needs to calculate differences of value between same types. Also the first value of a type will become 0 or NaN, since there are no previous value.

The important thing here: the type changes at some points in data at the same time interval, and sometimes when it does the value becomes lesser than the previous value of the same type. So when something like this happens the .diff() function results in a negative value.

What I'm trying to achieve: when the result of .diff() becomes negative, the value of that row must be set to 0 or NaN and in the next row the value must be equals to

original/previous value of the previous row - current value

I tried using diff with groupby, iterating dataframe etc. but couldn't manage to do it.

To give an example, I want the dataframe to be like this:

datetime                type                value
2021-08-13 23:46:00     type1               NaN
2021-08-13 23:47:00     type1               10
2021-08-13 23:48:00     type1               20
2021-08-13 23:48:00     type2               NaN
2021-08-13 23:49:00     type2               10
2021-08-13 23:50:00     type2               5
2021-08-13 23:50:00     type1               NaN
2021-08-13 23:51:00     type1               10
2021-08-13 23:51:00     type1               5

Edit: When I use:

df['value'] = df.groupby('type')['value'].diff()

The data becomes like this:

datetime                type                value
2021-08-13 23:46:00     type1               NaN
2021-08-13 23:47:00     type1               10
2021-08-13 23:48:00     type1               20
2021-08-13 23:48:00     type2               NaN
2021-08-13 23:49:00     type2               10
2021-08-13 23:50:00     type2               5
2021-08-13 23:50:00     type1               -15
2021-08-13 23:51:00     type1               10
2021-08-13 23:51:00     type1               5

The value in last 7th row becomes very different what I'm trying to achieve.

Upvotes: 0

Views: 1403

Answers (1)

mozway
mozway

Reputation: 260790

You can group by 'type' and apply diff:

df['value'] = df.groupby('type')['value'].diff()

output:

              datetime   type  value
0  2021-08-13 23:46:00  type1    NaN
1  2021-08-13 23:47:00  type1   10.0
2  2021-08-13 23:48:00  type1   20.0
3  2021-08-13 23:48:00  type2    NaN
4  2021-08-13 23:49:00  type2   10.0

To have only positive values, NaN otherwise:

df['value'] = df.groupby('type')['value'].diff().where(lambda x: x>0, np.NaN)

output on new dataset:

              datetime   type  value
0  2021-08-13 23:46:00  type1    NaN
1  2021-08-13 23:47:00  type1   10.0
2  2021-08-13 23:48:00  type1   20.0
3  2021-08-13 23:48:00  type2    NaN
4  2021-08-13 23:49:00  type2   10.0
5  2021-08-13 23:50:00  type2    5.0
6  2021-08-13 23:50:00  type1    NaN
7  2021-08-13 23:51:00  type1   10.0

And to replace negative values with 0:

df['value'] = df.groupby('type')['value'].diff().clip(lower=0)

Upvotes: 2

Related Questions