DrTRD
DrTRD

Reputation: 1718

Cumulative Sum Up to Threshold

I have a two column pandas DataFrame, that looks something like the following:

import pandas as pd
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])

Given a threshold, of say 1500, I'd like to take the cumulative sum of the product of the two columns (until the total reaches the threshold of 1500) and divide by the cumulative sum of the 'Units' column such that the cumulative sum only sums up to 1500. I can implement this using for loops, but how would a panda do it?

To spell out the details, a bit:

df['Product'] = df.prod(axis=1)
df['CumSum'] = df['Product'].cumsum()

At index=1, the cumulative sum exceeds the threshold (1700 > 1500). We would then like to take only the number of units that gets to the threshold. For example, the result would be df:

Price   Units  Product  CumSum  CumSumWithThreshold
  5       100     500     500          500
  6       200    1200    1700         1500
  7       250    1750    3450            0
  8      1000    8000   11450            0

Given you have $1500 to spend, what's the average price per unit? In the above, you can afford 100 units at price $5 (total of $500) and (1000/6) units of price $6 (total of $1000 at this price). The average price per unit is therefore: $1500/(100 + (1000/6)) = $5.625...

Upvotes: 2

Views: 2376

Answers (2)

DrTRD
DrTRD

Reputation: 1718

I've been able to come close, but perhaps not exactly what a panda would do. Perhaps this answer will percolate some more creativity:

import pandas as pd
import numpy as np

Threshold = 1500
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])

df['Diff'] = df.prod(axis=1).cumsum() - Threshold
df['ThisUnits'] = df.apply(lambda x: x['Units'] if x['Diff']<0 else np.max([0,x['Units'] - x['Diff']/x['Price']]),axis=1)
print('Result: $%.2f' % (Threshold/df['ThisUnits'].sum()))

Any other ideas?

Upvotes: 2

crow_t_robot
crow_t_robot

Reputation: 524

I assume this is the type of for loop you have tried, but this does not require any calculations beyond the terminating row.

import pandas as pd
df = pd.DataFrame([[5,100],[6,200],[7,250],[8,1000]],columns=['Price','Units'])

def average_price_to_buy_thresh(df, thresh):
    total = thresh
    units = 0
    for i, (px, amt) in df.iterrows():
        prod = px*amt
        if prod<total:
            units += amt
            total -= prod
        else:
            units += total/px
            return (thresh/units)

print('Result: $%.2f' % average_price_to_buy_thresh(df, 1500))

Upvotes: 2

Related Questions