Zanam
Zanam

Reputation: 4807

Pandas cumsum after sorting on different column

I have a dataframe df as following:

Col1    Col2    SortVal Qty    Col3
R1      Grp1    0.88    160.8  1
R2      Grp1    2.28    20.9   3
R1      Grp1    2.09    184.6  5  
R1      Grp1    2.44    55.2   7
R2      Grp1    0.61    80.4   9 
R1      Grp1    2.41    41.9   1
R2      Grp1    1.66    92.3   4
R2      Grp1    2.3     27.6   6

I need to do the following:

I am trying the following:

df.sort_values('SortVal', ascending=True).groupby(['Col1', 'Col2']).cumsum()

But I am not getting the following desired solution:

Col1    Col2    SortVal Qty     cumS    Col3
R1      Grp1    0.88    160.8   160.8   1
R1      Grp1    2.09    184.6   345.4   5
R1      Grp1    2.41    41.9    387.3   1
R1      Grp1    2.44    55.2    442.5   7
R2      Grp1    0.61    80.4    80.4    9
R2      Grp1    1.66    92.3    172.7   4
R2      Grp1    2.28    20.9    193.6   3
R2      Grp1    2.3     27.6    221.2   6

Upvotes: 2

Views: 512

Answers (2)

MrNobody33
MrNobody33

Reputation: 6483

This could be a way, using apply and sot_values, to sort each group, and then assign to get the cumsum:

df.groupby(['Col1', 'Col2'],as_index=False)
 .apply(lambda x: x.sort_values('SortVal', ascending=True)
 .assign(cumS=x.sort_values('SortVal', ascending=True)['Qty'].cumsum()))
 .reset_index(drop=True)

Output:

  Col1  Col2  SortVal    Qty  Col3       cumS
0   R1  Grp1     0.88  160.8     1       160.8
1   R1  Grp1     2.09  184.6     5       345.4
2   R1  Grp1     2.41   41.9     1       387.3
3   R1  Grp1     2.44   55.2     7       442.5
4   R2  Grp1     0.61   80.4     9        80.4
5   R2  Grp1     1.66   92.3     4       172.7
6   R2  Grp1     2.28   20.9     3       193.6
7   R2  Grp1     2.30   27.6     6       221.2

Upvotes: 1

cs95
cs95

Reputation: 402673

Sort in advance, then group and compute the cumsum. After that, sort again:

(df.assign(cumS=df.sort_values(['Col1', 'Col2', 'SortVal')
                  .groupby(['Col1', 'Col2'], sort=False)['Qty']
                  .cumsum())
   .sort_values(['Col1', 'Col2', 'SortVal']))     

  Col1  Col2  SortVal    Qty  Col3   cumS
0   R1  Grp1     0.88  160.8     1  160.8
2   R1  Grp1     2.09  184.6     5  345.4
5   R1  Grp1     2.41   41.9     1  387.3
3   R1  Grp1     2.44   55.2     7  442.5
4   R2  Grp1     0.61   80.4     9   80.4
6   R2  Grp1     1.66   92.3     4  172.7
1   R2  Grp1     2.28   20.9     3  193.6
7   R2  Grp1     2.30   27.6     6  221.2

Another method, similar to the first but reduces one sort step in exchange for an apply:

(df.sort_values(['Col1', 'Col2', 'SortVal']) 
   .groupby(['Col1', 'Col2'], sort=False) 
   .apply(lambda x: x.assign(cumS=x['Qty'].cumsum())))

  Col1  Col2  SortVal    Qty  Col3   cumS
0   R1  Grp1     0.88  160.8     1  160.8
2   R1  Grp1     2.09  184.6     5  345.4
5   R1  Grp1     2.41   41.9     1  387.3
3   R1  Grp1     2.44   55.2     7  442.5
4   R2  Grp1     0.61   80.4     9   80.4
6   R2  Grp1     1.66   92.3     4  172.7
1   R2  Grp1     2.28   20.9     3  193.6
7   R2  Grp1     2.30   27.6     6  221.2

Upvotes: 1

Related Questions