Atul Singh
Atul Singh

Reputation: 23

How to calculate row wise trailing sum which resets on every next date in data in python/pandas?

Below is what the data looks like

import pandas as pd

df = pd.DataFrame({'date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-01-01',
                           '2021-01-02','2021-01-02','2021-01-02','2021-01-02','2021-01-02'],
                    'StP' : [500., 600., 700., 800., 900., 
                           500., 600., 700., 800., 900.],
                   'COI': [1., 2., 3., 4., 5.,
                           2., 3., 4., 5., 6.],
                   'POI': [3., 2., 4., 1., 5.,
                           2., 1., 3., 0., 5.],})

print(df)
date StP COI POI
2021-01-01 500.0 1.0 3.0
2021-01-01 600.0 2.0 2.0
2021-01-01 700.0 3.0 4.0
2021-01-01 800.0 4.0 1.0
2021-01-01 900.0 5.0 5.0
2021-01-02 500.0 2.0 2.0
2021-01-02 600.0 3.0 1.0
2021-01-02 700.0 4.0 3.0
2021-01-02 800.0 5.0 0.0
2021-01-02 900.0 6.0 5.0

I need to create two columns (wl_COI and wl_POI) each derived from COI and POI respectively such that wl_COI has the sum of all entries with StP higher than the current row for a particular date. Similarly, wl_POI will have sum of all entries with StP lower than or equal to current row for a particular date. Sample desired output below -

date StP COI POI wl_COI wl_POI
2021-01-01 500.0 1.0 3.0 14.0 3.0
2021-01-01 600.0 2.0 2.0 12.0 5.0
2021-01-01 700.0 3.0 4.0 9.0 9.0
2021-01-01 800.0 4.0 1.0 5.0 10.0
2021-01-01 900.0 5.0 5.0 0.0 15.0
2021-01-02 500.0 2.0 2.0 18.0 2.0
2021-01-02 600.0 3.0 1.0 15.0 3.0
2021-01-02 700.0 4.0 3.0 11.0 6.0
2021-01-02 800.0 5.0 0.0 6.0 6.0
2021-01-02 900.0 6.0 5.0 0.0 11.0

I can do this very easily in excel but I am new to python and can't seem to figure out how to do this. Thank you for your help!

Upvotes: 2

Views: 68

Answers (2)

Utsav
Utsav

Reputation: 5918

Considering the values of StP increases per row for each date.

Code

df.date = pd.to_datetime(df.date)
gb = df.groupby('date')
df['wl_COI'] = gb.COI.transform('sum') - gb.COI.cumsum()
df['wl_POI'] = gb.POI.cumsum()

Output

    date        StP     COI POI wl_COI  wl_POI
0   2021-01-01  500.0   1.0 3.0 14.0    3.0
1   2021-01-01  600.0   2.0 2.0 12.0    5.0
2   2021-01-01  700.0   3.0 4.0 9.0     9.0
3   2021-01-01  800.0   4.0 1.0 5.0     10.0
4   2021-01-01  900.0   5.0 5.0 0.0     15.0
5   2021-01-02  500.0   2.0 2.0 18.0    2.0
6   2021-01-02  600.0   3.0 1.0 15.0    3.0
7   2021-01-02  700.0   4.0 3.0 11.0    6.0
8   2021-01-02  800.0   5.0 0.0 6.0     6.0
9   2021-01-02  900.0   6.0 5.0 0.0     11.0

Upvotes: 0

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39900

This should do the trick:

import pandas as pd 
import numpy as np


def compute_wl_cio(df, row):
    return sum(df['COI'].loc[np.logical_and(df['date']==row['date'], df['StP']>row['StP'])])

def compute_wl_pio(df, row):
    return sum(df['POI'].loc[np.logical_and(df['date']==row['date'], df['StP']<=row['StP'])])

And here's the output:

         date    StP  COI  POI  wl_COI  wl_POI
0  2021-01-01  500.0  1.0  3.0    14.0     3.0
1  2021-01-01  600.0  2.0  2.0    12.0     5.0
2  2021-01-01  700.0  3.0  4.0     9.0     9.0
3  2021-01-01  800.0  4.0  1.0     5.0    10.0
4  2021-01-01  900.0  5.0  5.0     0.0    15.0
5  2021-01-02  500.0  2.0  2.0    18.0     2.0
6  2021-01-02  600.0  3.0  1.0    15.0     3.0
7  2021-01-02  700.0  4.0  3.0    11.0     6.0
8  2021-01-02  800.0  5.0  0.0     6.0     6.0
9  2021-01-02  900.0  6.0  5.0     0.0    11.0

You can even do it in just a single function:

import operator 


def compute_wl(df, row, col, op):
    return sum(df[col].loc[np.logical_and(df['date']==row['date'], op(df['StP'], row['StP']))])


df['wl_COI'] = df.apply(lambda row: compute_wl(df, row, 'COI', operator.gt), axis=1)
df['wl_POI'] = df.apply(lambda row: compute_wl(df, row, 'POI', operator.le), axis=1)

Upvotes: 0

Related Questions