Reputation: 23
Below is what the data looks like
import pandas as pd
df = pd.DataFrame({'date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-01-01',
'2021-01-02','2021-01-02','2021-01-02','2021-01-02','2021-01-02'],
'StP' : [500., 600., 700., 800., 900.,
500., 600., 700., 800., 900.],
'COI': [1., 2., 3., 4., 5.,
2., 3., 4., 5., 6.],
'POI': [3., 2., 4., 1., 5.,
2., 1., 3., 0., 5.],})
print(df)
date | StP | COI | POI |
---|---|---|---|
2021-01-01 | 500.0 | 1.0 | 3.0 |
2021-01-01 | 600.0 | 2.0 | 2.0 |
2021-01-01 | 700.0 | 3.0 | 4.0 |
2021-01-01 | 800.0 | 4.0 | 1.0 |
2021-01-01 | 900.0 | 5.0 | 5.0 |
2021-01-02 | 500.0 | 2.0 | 2.0 |
2021-01-02 | 600.0 | 3.0 | 1.0 |
2021-01-02 | 700.0 | 4.0 | 3.0 |
2021-01-02 | 800.0 | 5.0 | 0.0 |
2021-01-02 | 900.0 | 6.0 | 5.0 |
I need to create two columns (wl_COI and wl_POI) each derived from COI and POI respectively such that wl_COI has the sum of all entries with StP higher than the current row for a particular date. Similarly, wl_POI will have sum of all entries with StP lower than or equal to current row for a particular date. Sample desired output below -
date | StP | COI | POI | wl_COI | wl_POI |
---|---|---|---|---|---|
2021-01-01 | 500.0 | 1.0 | 3.0 | 14.0 | 3.0 |
2021-01-01 | 600.0 | 2.0 | 2.0 | 12.0 | 5.0 |
2021-01-01 | 700.0 | 3.0 | 4.0 | 9.0 | 9.0 |
2021-01-01 | 800.0 | 4.0 | 1.0 | 5.0 | 10.0 |
2021-01-01 | 900.0 | 5.0 | 5.0 | 0.0 | 15.0 |
2021-01-02 | 500.0 | 2.0 | 2.0 | 18.0 | 2.0 |
2021-01-02 | 600.0 | 3.0 | 1.0 | 15.0 | 3.0 |
2021-01-02 | 700.0 | 4.0 | 3.0 | 11.0 | 6.0 |
2021-01-02 | 800.0 | 5.0 | 0.0 | 6.0 | 6.0 |
2021-01-02 | 900.0 | 6.0 | 5.0 | 0.0 | 11.0 |
I can do this very easily in excel but I am new to python and can't seem to figure out how to do this. Thank you for your help!
Upvotes: 2
Views: 68
Reputation: 5918
Considering the values of StP increases per row
for each date
.
Code
df.date = pd.to_datetime(df.date)
gb = df.groupby('date')
df['wl_COI'] = gb.COI.transform('sum') - gb.COI.cumsum()
df['wl_POI'] = gb.POI.cumsum()
Output
date StP COI POI wl_COI wl_POI
0 2021-01-01 500.0 1.0 3.0 14.0 3.0
1 2021-01-01 600.0 2.0 2.0 12.0 5.0
2 2021-01-01 700.0 3.0 4.0 9.0 9.0
3 2021-01-01 800.0 4.0 1.0 5.0 10.0
4 2021-01-01 900.0 5.0 5.0 0.0 15.0
5 2021-01-02 500.0 2.0 2.0 18.0 2.0
6 2021-01-02 600.0 3.0 1.0 15.0 3.0
7 2021-01-02 700.0 4.0 3.0 11.0 6.0
8 2021-01-02 800.0 5.0 0.0 6.0 6.0
9 2021-01-02 900.0 6.0 5.0 0.0 11.0
Upvotes: 0
Reputation: 39900
This should do the trick:
import pandas as pd
import numpy as np
def compute_wl_cio(df, row):
return sum(df['COI'].loc[np.logical_and(df['date']==row['date'], df['StP']>row['StP'])])
def compute_wl_pio(df, row):
return sum(df['POI'].loc[np.logical_and(df['date']==row['date'], df['StP']<=row['StP'])])
And here's the output:
date StP COI POI wl_COI wl_POI
0 2021-01-01 500.0 1.0 3.0 14.0 3.0
1 2021-01-01 600.0 2.0 2.0 12.0 5.0
2 2021-01-01 700.0 3.0 4.0 9.0 9.0
3 2021-01-01 800.0 4.0 1.0 5.0 10.0
4 2021-01-01 900.0 5.0 5.0 0.0 15.0
5 2021-01-02 500.0 2.0 2.0 18.0 2.0
6 2021-01-02 600.0 3.0 1.0 15.0 3.0
7 2021-01-02 700.0 4.0 3.0 11.0 6.0
8 2021-01-02 800.0 5.0 0.0 6.0 6.0
9 2021-01-02 900.0 6.0 5.0 0.0 11.0
You can even do it in just a single function:
import operator
def compute_wl(df, row, col, op):
return sum(df[col].loc[np.logical_and(df['date']==row['date'], op(df['StP'], row['StP']))])
df['wl_COI'] = df.apply(lambda row: compute_wl(df, row, 'COI', operator.gt), axis=1)
df['wl_POI'] = df.apply(lambda row: compute_wl(df, row, 'POI', operator.le), axis=1)
Upvotes: 0