How to calculate row wise trailing sum which resets on every next date in data in python/pandas?

Question

Below is what the data looks like

import pandas as pd

df = pd.DataFrame({'date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-01-01',
                           '2021-01-02','2021-01-02','2021-01-02','2021-01-02','2021-01-02'],
                    'StP' : [500., 600., 700., 800., 900., 
                           500., 600., 700., 800., 900.],
                   'COI': [1., 2., 3., 4., 5.,
                           2., 3., 4., 5., 6.],
                   'POI': [3., 2., 4., 1., 5.,
                           2., 1., 3., 0., 5.],})

print(df)

date	StP	COI	POI
2021-01-01	500.0	1.0	3.0
2021-01-01	600.0	2.0	2.0
2021-01-01	700.0	3.0	4.0
2021-01-01	800.0	4.0	1.0
2021-01-01	900.0	5.0	5.0
2021-01-02	500.0	2.0	2.0
2021-01-02	600.0	3.0	1.0
2021-01-02	700.0	4.0	3.0
2021-01-02	800.0	5.0	0.0
2021-01-02	900.0	6.0	5.0

I need to create two columns (wl_COI and wl_POI) each derived from COI and POI respectively such that wl_COI has the sum of all entries with StP higher than the current row for a particular date. Similarly, wl_POI will have sum of all entries with StP lower than or equal to current row for a particular date. Sample desired output below -

date	StP	COI	POI	wl_COI	wl_POI
2021-01-01	500.0	1.0	3.0	14.0	3.0
2021-01-01	600.0	2.0	2.0	12.0	5.0
2021-01-01	700.0	3.0	4.0	9.0	9.0
2021-01-01	800.0	4.0	1.0	5.0	10.0
2021-01-01	900.0	5.0	5.0	0.0	15.0
2021-01-02	500.0	2.0	2.0	18.0	2.0
2021-01-02	600.0	3.0	1.0	15.0	3.0
2021-01-02	700.0	4.0	3.0	11.0	6.0
2021-01-02	800.0	5.0	0.0	6.0	6.0
2021-01-02	900.0	6.0	5.0	0.0	11.0

I can do this very easily in excel but I am new to python and can't seem to figure out how to do this. Thank you for your help!

Giorgos Myrianthous · Accepted Answer

This should do the trick:

import pandas as pd 
import numpy as np


def compute_wl_cio(df, row):
    return sum(df['COI'].loc[np.logical_and(df['date']==row['date'], df['StP']>row['StP'])])

def compute_wl_pio(df, row):
    return sum(df['POI'].loc[np.logical_and(df['date']==row['date'], df['StP']<=row['StP'])])

And here's the output:

         date    StP  COI  POI  wl_COI  wl_POI
0  2021-01-01  500.0  1.0  3.0    14.0     3.0
1  2021-01-01  600.0  2.0  2.0    12.0     5.0
2  2021-01-01  700.0  3.0  4.0     9.0     9.0
3  2021-01-01  800.0  4.0  1.0     5.0    10.0
4  2021-01-01  900.0  5.0  5.0     0.0    15.0
5  2021-01-02  500.0  2.0  2.0    18.0     2.0
6  2021-01-02  600.0  3.0  1.0    15.0     3.0
7  2021-01-02  700.0  4.0  3.0    11.0     6.0
8  2021-01-02  800.0  5.0  0.0     6.0     6.0
9  2021-01-02  900.0  6.0  5.0     0.0    11.0

You can even do it in just a single function:

import operator 


def compute_wl(df, row, col, op):
    return sum(df[col].loc[np.logical_and(df['date']==row['date'], op(df['StP'], row['StP']))])


df['wl_COI'] = df.apply(lambda row: compute_wl(df, row, 'COI', operator.gt), axis=1)
df['wl_POI'] = df.apply(lambda row: compute_wl(df, row, 'POI', operator.le), axis=1)

How to calculate row wise trailing sum which resets on every next date in data in python/pandas?

Answers (2)

Related Questions