joseph
joseph

Reputation: 301

pandas diff between within successive groups

d = pd.DataFrame({'a':[7,6,3,4,8], 'b':['c','c','d','d','c']})
d.groupby('b')['a'].diff()

Gives me

0    NaN
1   -1.0
2    NaN
3    1.0
4    2.0

What I'd need

0    NaN
1   -1.0
2    NaN
3    1.0
4    NaN  

Which is difference between only successive values within group, so when a group appears after another group , it's previous values are ignored.

In my example last c value is a new c group.

Upvotes: 0

Views: 105

Answers (1)

Zero
Zero

Reputation: 76967

You would need to groupby on consecutive segments

In [1055]: d.groupby((d.b != d.b.shift()).cumsum())['a'].diff()
Out[1055]:
0    NaN
1   -1.0
2    NaN
3    1.0
4    NaN
Name: a, dtype: float64

Details

In [1056]: (d.b != d.b.shift()).cumsum()
Out[1056]:
0    1
1    1
2    2
3    2
4    3
Name: b, dtype: int32

Upvotes: 2

Related Questions