Greg Hinch
Greg Hinch

Reputation: 827

How can I reverse cumsum within groups to a specific row with pandas?

I have a dataframe with 3 columns, GP, LL, and QN:

        GP     LL    QN 
-------------------------
0       A0001  AA    NaN
1       A0001  BB    3.0
2       A0001  CC    5.0 
3       A0001  DD    14.0
4       A0001  EE    NaN
...     ...    ...   ...

Each GP value repeats for a series of records, with different values of LL for each GP. Each LL value has an individual numeric value in QN, some of which may be NaN. The values in LL may repeat across different values of GP, but do not repeat for the same value of GP. Each GP does not necessarily have the same values for LL as other values of GP, though some may.

I want to write a function which takes as an input a value of LL and returns a copy of the data frame with a new column, SM. The values for each row in this would be the cumsum of the values in QN from that row +1, down to the input value of LL.

So for example if the input to this function was "DD", using some of the records above, you might get a result like:

        GP     LL    QN   SM
----------------------------
0       A0001  AA    NaN  22.0
1       A0001  BB    3.0  19.0
2       A0001  CC    5.0  14.0
3       A0001  DD    14.0 0.0
4       A0001  EE    NaN  -8.0
...     ...    ...   ...

Struggling to find a way to accomplish this, short some really inefficient uses of apply.

Upvotes: 1

Views: 282

Answers (1)

jezrael
jezrael

Reputation: 862911

If need reverse cumulative sum with shifting per groups use:

print (df)
      GP  LL    QN
0  A0001  AA   NaN
1  A0001  BB   3.0
2  A0001  CC   5.0
3  A0001  DD  14.0
4  A0001  EE   NaN
5  A0001  CC   2.0
6  A0001  DD   3.0
7  A0001  EE   8.0
8  A0001  FF   4.0

val = 'DD'

#reverse all rows
df1 = df.iloc[::-1]
#create groups by compare val
g = df1['LL'].eq(val).groupby(df1['GP']).cumsum()
#rows after last val set to `NaN` in later filtering `df1[m]`
m = g.gt(0)

#filtering and grouping by GP with Series g for shift with cumulative sum
df['SM'] = df1[m].groupby(['GP',g]).transform(lambda x: x.shift().cumsum()).iloc[::-1]

print (df)
      GP  LL    QN    SM
0  A0001  AA   NaN  22.0
1  A0001  BB   3.0  19.0
2  A0001  CC   5.0  14.0
3  A0001  DD  14.0   NaN
4  A0001  EE   NaN   5.0
5  A0001  CC   2.0   3.0
6  A0001  DD   3.0   NaN
7  A0001  EE   8.0   NaN
8  A0001  FF   4.0   NaN

Upvotes: 2

Related Questions