Reputation: 827
I have a dataframe with 3 columns, GP, LL, and QN:
GP LL QN
-------------------------
0 A0001 AA NaN
1 A0001 BB 3.0
2 A0001 CC 5.0
3 A0001 DD 14.0
4 A0001 EE NaN
... ... ... ...
Each GP value repeats for a series of records, with different values of LL for each GP. Each LL value has an individual numeric value in QN, some of which may be NaN. The values in LL may repeat across different values of GP, but do not repeat for the same value of GP. Each GP does not necessarily have the same values for LL as other values of GP, though some may.
I want to write a function which takes as an input a value of LL and returns a copy of the data frame with a new column, SM. The values for each row in this would be the cumsum of the values in QN from that row +1, down to the input value of LL.
So for example if the input to this function was "DD", using some of the records above, you might get a result like:
GP LL QN SM
----------------------------
0 A0001 AA NaN 22.0
1 A0001 BB 3.0 19.0
2 A0001 CC 5.0 14.0
3 A0001 DD 14.0 0.0
4 A0001 EE NaN -8.0
... ... ... ...
Struggling to find a way to accomplish this, short some really inefficient uses of apply.
Upvotes: 1
Views: 282
Reputation: 862911
If need reverse cumulative sum with shifting per groups use:
print (df)
GP LL QN
0 A0001 AA NaN
1 A0001 BB 3.0
2 A0001 CC 5.0
3 A0001 DD 14.0
4 A0001 EE NaN
5 A0001 CC 2.0
6 A0001 DD 3.0
7 A0001 EE 8.0
8 A0001 FF 4.0
val = 'DD'
#reverse all rows
df1 = df.iloc[::-1]
#create groups by compare val
g = df1['LL'].eq(val).groupby(df1['GP']).cumsum()
#rows after last val set to `NaN` in later filtering `df1[m]`
m = g.gt(0)
#filtering and grouping by GP with Series g for shift with cumulative sum
df['SM'] = df1[m].groupby(['GP',g]).transform(lambda x: x.shift().cumsum()).iloc[::-1]
print (df)
GP LL QN SM
0 A0001 AA NaN 22.0
1 A0001 BB 3.0 19.0
2 A0001 CC 5.0 14.0
3 A0001 DD 14.0 NaN
4 A0001 EE NaN 5.0
5 A0001 CC 2.0 3.0
6 A0001 DD 3.0 NaN
7 A0001 EE 8.0 NaN
8 A0001 FF 4.0 NaN
Upvotes: 2