Cumulative Sum Up to Threshold

Question

I have a two column pandas DataFrame, that looks something like the following:

import pandas as pd
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])

Given a threshold, of say 1500, I'd like to take the cumulative sum of the product of the two columns (until the total reaches the threshold of 1500) and divide by the cumulative sum of the 'Units' column such that the cumulative sum only sums up to 1500. I can implement this using for loops, but how would a panda do it?

To spell out the details, a bit:

df['Product'] = df.prod(axis=1)
df['CumSum'] = df['Product'].cumsum()

At index=1, the cumulative sum exceeds the threshold (1700 > 1500). We would then like to take only the number of units that gets to the threshold. For example, the result would be df:

Price   Units  Product  CumSum  CumSumWithThreshold
  5       100     500     500          500
  6       200    1200    1700         1500
  7       250    1750    3450            0
  8      1000    8000   11450            0

Given you have $1500 to spend, what's the average price per unit? In the above, you can afford 100 units at price $5 (total of $500) and (1000/6) units of price $6 (total of $1000 at this price). The average price per unit is therefore: $1500/(100 + (1000/6)) = $5.625...

DrTRD · Accepted Answer

I've been able to come close, but perhaps not exactly what a panda would do. Perhaps this answer will percolate some more creativity:

import pandas as pd
import numpy as np

Threshold = 1500
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])

df['Diff'] = df.prod(axis=1).cumsum() - Threshold
df['ThisUnits'] = df.apply(lambda x: x['Units'] if x['Diff']<0 else np.max([0,x['Units'] - x['Diff']/x['Price']]),axis=1)
print('Result: $%.2f' % (Threshold/df['ThisUnits'].sum()))

Any other ideas?

Cumulative Sum Up to Threshold

Answers (2)

Related Questions