Reputation: 23
I have the following Pandas dataframe:
UPDATE:
I slightly changed the example (Last row) to make the output clearer for @mozway
value initial_quantity updated_quantity
date
2021-09-01 50 100 100
2021-10-01 50 100 100
2021-11-01 150 100 100
2021-12-01 110 100 100
What I want to do is to iterate over the dataframe and subtract the value from updated_quantity column. If the result is negative then the updated_quantity cell should be 0 and the rest should be subtracted from the previous cells until either the rest equal to 0 or we reach the initial (first) row.
The expected output from the above example is the following:
value initial_quantity updated_quantity
date
2021-09-01 50 100 40
2021-10-01 50 100 0
2021-11-01 150 100 0
2021-12-01 110 100 0
A working solution that I tried is based on two for loops as follows:
import pandas as pd
# Demo data
d = {
'date': ['2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'],
'value': [50, 50, 150, 110],
'initial_quantity': [100, 100, 100, 100],
'updated_quantity': [100, 100, 100, 100],
}
df = pd.DataFrame(data=d)
# Set date as index
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
# Iterate over rows and check conditions
for index, row in df.iterrows():
value = row['value']
for i, r in df.loc[index::-1].iterrows():
if (value > 0):
if (r['updated_quantity'] - value >= 0):
df.at[i,'updated_quantity'] = r['updated_quantity'] - value
value = 0
else:
value = value - r['updated_quantity']
# If current index is the 1st index of the df then set the last value (it can be negative) and break
if i != df.first_valid_index():
df.at[i,'updated_quantity'] = 0
else:
df.at[i,'updated_quantity'] = -value
break
else:
break
It is so slow when applied on large data. Is there a better/faster way to do this? Maybe by using apply function?
In case the expected output isn't clear enough, here are the intermediate (row by row) dataframes.
value initial_quantity updated_quantity
date
2021-09-01 50 100 50 #(100-50) = (old updated_quantity - value)
2021-10-01 50 100 100
2021-11-01 150 100 100
2021-12-01 110 100 100
value initial_quantity updated_quantity
date
2021-09-01 50 100 50
2021-10-01 50 100 50 #(100-50) = (old updated_quantity - value)
2021-11-01 150 100 100
2021-12-01 110 100 100
value initial_quantity updated_quantity
date
2021-09-01 50 100 50
2021-10-01 50 100 0 #(50-50) = (The rest of calculation made in the previous '3rd' row - value)
2021-11-01 150 100 0 #(100-150)* = (old updated_quantity - value)
2021-12-01 110 100 100
value initial_quantity updated_quantity
date
2021-09-01 50 100 40 #(50-10) = (old updated_quantity - The rest of calculation made in the previous '2nd' row)
2021-10-01 50 100 0 #(0-10)*
2021-11-01 150 100 0 #(0-10)*
2021-12-01 110 100 0 #(100-110)* (old updated_quantity - value)
(* if the result is negative then the value will be clipped to 0
and the rest will be subtracted from the previous row)
Upvotes: 1
Views: 1089
Reputation: 262484
df['updated_quantity'] = (df['initial_quantity']-df['value'].cumsum()).clip(lower=0)
output:
value initial_quantity updated_quantity
date
2021-09-01 50 100 50
2021-10-01 50 100 0
2021-11-01 150 100 0
2021-12-01 200 100 0
NB. do you really want negative values? Seems like -50 should be 50. Anyway, if you really want this, add .mul(-1)
to the pipe.
Upvotes: 1