Reputation: 10223
I have an outer group and an inner group and I wish to find the difference within each inner group depending on the outer group. Normally, I can nest the inner group within each outer group using groupby
but, for some reason, the diff
function for groupby
returns a flat vector instead of a nested array.
df = pd.DataFrame({'inner':list('aabbccddee'),'outer':[0,0,1,1,0,0,1,1,0,0],
'value':np.random.randint(0,100,10)})
inner outer value
0 a 0 78
1 a 0 68
2 b 1 78
3 b 1 22
4 c 0 53
5 c 0 25
6 d 1 82
7 d 1 38
8 e 0 2
9 e 0 39
If I desire the sum
, for example, for the inner group for each outer group, I simply use groupby
:
In [19]: df.groupby(['outer','inner']).sum()
Out[19]:
value
outer inner
0 a 146
c 78
e 41
1 b 100
d 120
The above is the correct output and it works for all other functions except diff
. When I use diff
, I want output in a format similar to the above but instead, I get:
In [20]: df.groupby(['outer','inner']).diff()
Out[20]:
value
0 NaN
1 -10.0
2 NaN
3 -56.0
4 NaN
5 -28.0
6 NaN
7 -44.0
8 NaN
9 37.0
The above is equivalent to df.groupby(['inner']).value.diff()
so it seems groupby
is not considering the outer group. I can find workouts for this no problem but using groupby
for this would be more elegant and succinct. Does anyone know why this is happening and how it could be remedied?
Upvotes: 0
Views: 1097
Reputation: 75110
Functions like s.diff()
, cumsum
etc are non aggregation function hence you would get the result in shape of a series, you could use np.diff()
here, example below:
print(df.groupby(['outer','inner'])['value'].apply(lambda x: np.diff(x).item()))
outer inner
0 a -10
c -28
e 37
1 b -56
d -44
Upvotes: 1