abdeltif-b
abdeltif-b

Reputation: 23

Pandas Dataframe Subtract Value From Previous Rows Based On Condition

I have the following Pandas dataframe:

UPDATE:

I slightly changed the example (Last row) to make the output clearer for @mozway

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 100
2021-10-01  50      100                 100
2021-11-01  150     100                 100
2021-12-01  110     100                 100

What I want to do is to iterate over the dataframe and subtract the value from updated_quantity column. If the result is negative then the updated_quantity cell should be 0 and the rest should be subtracted from the previous cells until either the rest equal to 0 or we reach the initial (first) row.

The expected output from the above example is the following:

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 40
2021-10-01  50      100                 0
2021-11-01  150     100                 0
2021-12-01  110     100                 0

A working solution that I tried is based on two for loops as follows:

import pandas as pd

# Demo data
d = {
    'date': ['2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'],
    'value': [50, 50, 150, 110],
    'initial_quantity': [100, 100, 100, 100], 
    'updated_quantity': [100, 100, 100, 100],
}
df = pd.DataFrame(data=d)
# Set date as index
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

# Iterate over rows and check conditions
for index, row in df.iterrows():
    value = row['value']

    for i, r in df.loc[index::-1].iterrows():
        if (value > 0):
            if (r['updated_quantity'] - value >= 0):
                df.at[i,'updated_quantity'] = r['updated_quantity'] - value
                value = 0
            else:
                value = value - r['updated_quantity']
                # If current index is the 1st index of the df then set the last value (it can be negative) and break
                if i != df.first_valid_index():
                    df.at[i,'updated_quantity'] = 0
                else:
                    df.at[i,'updated_quantity'] = -value
                    break
        else:
            break

It is so slow when applied on large data. Is there a better/faster way to do this? Maybe by using apply function?


In case the expected output isn't clear enough, here are the intermediate (row by row) dataframes.

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 50 #(100-50) = (old updated_quantity - value)
2021-10-01  50      100                 100
2021-11-01  150     100                 100
2021-12-01  110     100                 100

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 50
2021-10-01  50      100                 50 #(100-50) = (old updated_quantity - value)
2021-11-01  150     100                 100
2021-12-01  110     100                 100

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 50
2021-10-01  50      100                 0 #(50-50) = (The rest of calculation made in the previous '3rd' row - value)
2021-11-01  150     100                 0 #(100-150)* = (old updated_quantity - value)
2021-12-01  110     100                 100

            value   initial_quantity    updated_quantity
date    
2021-09-01  50      100                 40 #(50-10) = (old updated_quantity - The rest of calculation made in the previous '2nd' row)
2021-10-01  50      100                 0 #(0-10)*
2021-11-01  150     100                 0 #(0-10)*
2021-12-01  110     100                 0 #(100-110)*  (old updated_quantity - value)

(* if the result is negative then the value will be clipped to 0 
and the rest will be subtracted from the previous row)

Upvotes: 1

Views: 1089

Answers (1)

mozway
mozway

Reputation: 262484

You can use cumsum + clip:

df['updated_quantity'] = (df['initial_quantity']-df['value'].cumsum()).clip(lower=0)

output:

            value  initial_quantity  updated_quantity
date                                                 
2021-09-01     50               100                50
2021-10-01     50               100                 0
2021-11-01    150               100                 0
2021-12-01    200               100                 0

NB. do you really want negative values? Seems like -50 should be 50. Anyway, if you really want this, add .mul(-1) to the pipe.

Upvotes: 1

Related Questions