Reputation: 1718
I have a two column pandas DataFrame, that looks something like the following:
import pandas as pd
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])
Given a threshold, of say 1500, I'd like to take the cumulative sum of the product of the two columns (until the total reaches the threshold of 1500) and divide by the cumulative sum of the 'Units' column such that the cumulative sum only sums up to 1500. I can implement this using for loops, but how would a panda do it?
To spell out the details, a bit:
df['Product'] = df.prod(axis=1)
df['CumSum'] = df['Product'].cumsum()
At index=1, the cumulative sum exceeds the threshold (1700 > 1500). We would then like to take only the number of units that gets to the threshold. For example, the result would be df:
Price Units Product CumSum CumSumWithThreshold
5 100 500 500 500
6 200 1200 1700 1500
7 250 1750 3450 0
8 1000 8000 11450 0
Given you have $1500 to spend, what's the average price per unit? In the above, you can afford 100 units at price $5 (total of $500) and (1000/6) units of price $6 (total of $1000 at this price). The average price per unit is therefore: $1500/(100 + (1000/6)) = $5.625...
Upvotes: 2
Views: 2376
Reputation: 1718
I've been able to come close, but perhaps not exactly what a panda would do. Perhaps this answer will percolate some more creativity:
import pandas as pd
import numpy as np
Threshold = 1500
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])
df['Diff'] = df.prod(axis=1).cumsum() - Threshold
df['ThisUnits'] = df.apply(lambda x: x['Units'] if x['Diff']<0 else np.max([0,x['Units'] - x['Diff']/x['Price']]),axis=1)
print('Result: $%.2f' % (Threshold/df['ThisUnits'].sum()))
Any other ideas?
Upvotes: 2
Reputation: 524
I assume this is the type of for loop you have tried, but this does not require any calculations beyond the terminating row.
import pandas as pd
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])
def average_price_to_buy_thresh(df, thresh):
total = thresh
units = 0
for i, (px, amt) in df.iterrows():
prod = px*amt
if prod<total:
units += amt
total -= prod
else:
units += total/px
return (thresh/units)
print('Result: $%.2f' % average_price_to_buy_thresh(df, 1500))
Upvotes: 2