faster way to calculate the weighted mean based on rolling offset

Question

I have an example df like df = pd.DataFrame({'price': [100, 101, 99, 95, 97, 88], 'qty': [12, 5, 1, 3, 1, 3]}). I want to calculate the rolling 5 qty average of (price * qty / qty), and the desired output is 100, 101, 100.6, 97, 96.2, 91.2.

I don't have a good way to calculate this currently unfortunately, I have a slow way that gets close which is to calculate the cumulative sum of qty and then df.qty_cumsum[(df.qty_cumsum<= x.qty_cumsum- 5)].argmax() which returns the max arg of the qty - 5, then I can use this to calculate weighted average in a second step.

Thanks

perl · Accepted Answer

One option is to repeat price, then take rolling with rows, and groupby index, taking last:

np.repeat(df['price'], df['qty']).rolling(5).mean().groupby(level=0).last()

Output:

0    100.0
1    101.0
2    100.6
3     97.0
4     96.2
5     91.2
Name: price, dtype: float64

P.S. And if you have large qty values, it would also probably make sense to make it more efficient by clipping qty to 5 (since there is no difference if it's 5 or 12, for example):

np.repeat(df['price'], np.clip(df['qty'], 0, 5)
         ).rolling(5).mean().groupby(level=0).last()

faster way to calculate the weighted mean based on rolling offset

Answers (1)

Related Questions