pandas create column as lagged difference of two other columns grouped by key

Question

I have the following dataframe (df)


                       AmountNeeded         AmountAvailable
Source Target                                              
1      2                      290.0                   600.0
       4                      300.0                   600.0
       6                      200.0                   600.0
3      2                      290.0                   450.0
       5                      100.0                   450.0 
7      8                        0.0                   500.0

I would like to compute the remaining availability per source:


                       AmountNeeded    AmountAvailable       RemainingAvailability
Source Target                                              
1      2                      290.0             600.0                          600
       4                      300.0             600.0                          310
       6                      200.0             600.0                           10
3      2                      290.0             450.0                          450
       5                      100.0             450.0                          160
7      8                        0.0             500.0                          500

So if a Source appears more than once, I need to subtract the sum of lagged values of AmountNeeded for that particular Source.
If we take Source 1 and Target 4 the remaining amount should be AmountAvailable-AmountNeeded(previous_row) = 600 - 290 = 310
If we move to Source 1 and Target 6 this will be: 600 - (290+300) = 10.
This also be computed as RemainingAvailability - AmountNeeded = 310 - 300 = 10

I tried to use different combinations of groupby and diff but without much success.

jezrael · Accepted Answer

Use Series.sub with helper Series created by lambda function with Series.shift and cumulative sum Series.cumsum:

s = df.groupby(level=0)['AmountNeeded'].apply(lambda x: x.shift(fill_value=0).cumsum())
df['RemainingAvailability'] = df['AmountAvailable'].sub(s)
print (df)

               AmountNeeded  AmountAvailable  RemainingAvailability
Source Target                                                      
1      2              290.0            600.0                  600.0
       4              300.0            600.0                  310.0
       6              200.0            600.0                   10.0
3      2              290.0            450.0                  450.0
       5              100.0            450.0                  160.0
7      8                0.0            500.0                  500.0

pandas create column as lagged difference of two other columns grouped by key

Answers (1)

Related Questions