Daniel Arges
Daniel Arges

Reputation: 365

How can I compute the cumulative weighted average in new column?

Read all related pages on google and stackoverflow, and I still can't find the solution..

Given this df fragment:

        key_br_acc_posid  lot_in         price
ix                                            
1      1_885020_76141036    0.03       1.30004
2      1_885020_76236801    0.02       1.15297
5      1_885020_76502318    0.50    2752.08000
8      1_885020_76502318    4.50    2753.93000
9      1_885020_76502318    0.50    2753.93000
                 ...     ...           ...
1042  1_896967_123068980    0.01       1.17657
1044  1_896967_110335293    0.01      28.07100
1047  1_896967_110335293    0.01      24.14000
1053  1_896967_146913299   25.00      38.55000
1054  1_896967_147039856    2.00  121450.00000

How can I create a new column w_avg_price computing the moving weighted average price by key_br_acc_posid? The lot_in is the weight and the price is the value.

I tried many approaches with groupby() + np.average() buy I have to avoid the data aggregation. I need this value in each row.

Upvotes: 1

Views: 159

Answers (2)

piterbarg
piterbarg

Reputation: 8219

groupby and then perform the calculation for each group using cumsum()s:

(df.groupby('key_br_acc_posid', as_index = False)
    .apply(lambda g: g.assign(w_avg_price = (g['lot_in']*g['price']).cumsum()/g['lot_in'].cumsum()))    
    .reset_index(level = 0, drop = True)
)

result:


        key_br_acc_posid    lot_in         price    w_avg_price
----  ------------------  --------  ------------  -------------
   1   1_885020_76141036      0.03       1.30004        1.30004
   2   1_885020_76236801      0.02       1.15297        1.15297
   5   1_885020_76502318      0.5     2752.08        2752.08
   8   1_885020_76502318      4.5     2753.93        2753.74
   9   1_885020_76502318      0.5     2753.93        2753.76
1044  1_896967_110335293      0.01      28.071         28.071
1047  1_896967_110335293      0.01      24.14          26.1055
1042  1_896967_123068980      0.01       1.17657        1.17657
1053  1_896967_146913299     25         38.55          38.55
1054  1_896967_147039856      2     121450         121450

Upvotes: 2

hotplasma
hotplasma

Reputation: 68

I don't think I'm calculating it right, but what you want is cumsum()

df = pd.DataFrame({'lot_in':[.1,.2,.3],'price':[1.0,1.25,1.3]})
df['mvg_avg'] = (df['lot_in'] * df['price']).cumsum()
print(df)

       lot_in  price  mvg_avg
    0     0.1   1.00     0.10
    1     0.2   1.25     0.35
    2     0.3   1.30     0.74

Upvotes: 0

Related Questions