Pandas groupby.diff() not returning expected output

Question

I have an outer group and an inner group and I wish to find the difference within each inner group depending on the outer group. Normally, I can nest the inner group within each outer group using groupby but, for some reason, the diff function for groupby returns a flat vector instead of a nested array.

df = pd.DataFrame({'inner':list('aabbccddee'),'outer':[0,0,1,1,0,0,1,1,0,0],
    'value':np.random.randint(0,100,10)})

    inner  outer  value
0     a      0     78
1     a      0     68
2     b      1     78
3     b      1     22
4     c      0     53
5     c      0     25
6     d      1     82
7     d      1     38
8     e      0      2
9     e      0     39

If I desire the sum, for example, for the inner group for each outer group, I simply use groupby:

In [19]: df.groupby(['outer','inner']).sum()
Out[19]:
             value
outer inner
0     a        146
      c         78
      e         41
1     b        100
      d        120

The above is the correct output and it works for all other functions except diff. When I use diff, I want output in a format similar to the above but instead, I get:

In [20]: df.groupby(['outer','inner']).diff()
Out[20]:
   value
0    NaN
1  -10.0
2    NaN
3  -56.0
4    NaN
5  -28.0
6    NaN
7  -44.0
8    NaN
9   37.0

The above is equivalent to df.groupby(['inner']).value.diff() so it seems groupby is not considering the outer group. I can find workouts for this no problem but using groupby for this would be more elegant and succinct. Does anyone know why this is happening and how it could be remedied?

anky · Accepted Answer

Functions like s.diff(), cumsum etc are non aggregation function hence you would get the result in shape of a series, you could use np.diff() here, example below:

print(df.groupby(['outer','inner'])['value'].apply(lambda x: np.diff(x).item()))

outer  inner
0      a       -10
       c       -28
       e        37
1      b       -56
       d       -44

Pandas groupby.diff() not returning expected output

Answers (1)

Related Questions