Make two cumulative sum columns reset whenever one of the two resets

Question

I am stuck on the following problem:

I have this dataset (dummy values)

item    price    units
A         10       1
B         20       5
C         5        8
D         6        3

Out of it, I calculate the cumulative sum of both via:

threshold_price = 31
threshold_units = 8

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.price <= threshold_price: 
        cumsum += row.price 
    else: 
        last_reset = cumsum 
        cumsum = row.price 
    ls.append(cumsum) 
df['cs_price'] = ls

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.units <= threshold_units: 
        cumsum += row.units 
    else: 
        last_reset = cumsum 
        cumsum = row.units 
    ls.append(cumsum) 
df['cs_units'] = ls

From here I get:

item    price    units   cs_price   cs_units
A         10       1        10         1
B         20       5        30         6
C         5        8         5         8
D         6        3        11         3

I want to have a new column which we can call "Store" which would be equal to the first reset of any of the two columns and incrementing for each reset.

Given above example:

item    price    units   cs_price   cs_units   store
A         10       1        10         1         0
B         20       5        30         6         0
C         5        8        5          8         1 #+1 because of price
D         6        3        11         3         2 #+1 because of units

The main issue I have through this approach is that, cs_price and cs_units will not reset when a new store is attributed, which is something I need.

Ideal output should be following:

   item    price    units   cs_price   cs_units   store
    A         10       1        10         1         0
    B         20       5        30         6         0
    C         5        8        5          8         1 #+1 because of price, cs_units resets
    D         6        3        6          3         2 #+1 because of units, cs_price resets
    E         7        4        13         7         2
    F         8        2        8          2         3
    G         20       2        28         4         3
    H         6        3        6          3         4

My current output (which is not what I want) is:

   item    price    units   cs_price   cs_units   store
    A         10       1        10         1         0
    B         20       5        30         6         0
    C         5        8        5          8         1 #+1 because of cs_price
    D         6        3        11         3         1 
    E         7        4        18         7         1
    F         8        2        26         2         2 #+1 because of cs_units but cs_price does not reset 
    G         20       2        20         4         2
    H         6        3        26         7         2

Full code (I did not include match columns above, I hope it is still readable):

threshold_price = 31
threshold_units = 8

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.price <= threshold_price: 
        cumsum += row.price 
    else: 
        last_reset = cumsum 
        cumsum = row.price 
    ls.append(cumsum) 
df['cs_price'] = ls

ls = [] 
cumsum = 0 
lastreset = 0 
for _, row in df.iterrows(): 
    if cumsum + row.units <= threshold_units: 
        cumsum += row.units 
    else: 
        last_reset = cumsum 
        cumsum = row.units 
    ls.append(cumsum) 
df['cs_units'] = ls

df['match'] = df.cs_price < df.cs_price.shift()
df["match"] = df["match"].astype(int)

df['match2'] = df.cs_units < df.cs_units.shift()
df["match2"] = df["match2"].astype(int)

df["store_prep"] = df["match"].cumsum()
df["store_prep2"] = df["match2"].cumsum()
df["store"] = df[["store_prep","store_prep2"]].max(axis=1)

StevenS · Accepted Answer

Because your "reset" condition is dependent on both the price and units columns, you should loop through both columns together. During the loop, keep track of when a reset occurs, then do a cumsum on that reset list to get your store column.

Try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'item': ['A','B','C','D','E','F','G','H'], 
                   'price': [10,20,5,6,7,8,20,6], 
                   'units': [1,5,8,3,4,2,2,3]})

threshold_price = 31
threshold_units = 8

reset = []

list_price = [] 
list_units = []

cs_price = 0 
cs_units = 0

for price, units in zip(df['price'].to_list(), df['units'].to_list()): 
    if cs_price + price > threshold_price or cs_units + units > threshold_units:
        reset.append(True)
        cs_price = price
        cs_units = units              
    else:
        reset.append(False)
        cs_price += price  
        cs_units += units        
    list_price.append(cs_price)
    list_units.append(cs_units)
    
df['cs_price'] = list_price
df['cs_units'] = list_units

df['store'] = np.cumsum(reset)

print(df)

  item  price  units  cs_price  cs_units  store
0    A     10      1        10         1      0
1    B     20      5        30         6      0
2    C      5      8         5         8      1
3    D      6      3         6         3      2
4    E      7      4        13         7      2
5    F      8      2         8         2      3
6    G     20      2        28         4      3
7    H      6      3         6         3      4

Make two cumulative sum columns reset whenever one of the two resets

Answers (1)

Related Questions