Reputation: 1949
I have a dataframe with forest stand id, tree species, height and volume:
import pandas as pd
df=pd.DataFrame.from_items([('STAND_ID',[1,1,2,3,3,3]),('Species',['Conifer','Broadleaves','Conifer','Broadleaves','Conifer','Conifer']),
('Height',[20,19,13,24,25,18]),('Volume',[200,100,300,50,100,10])])
STAND_ID Species Height Volume
0 1 Conifer 20 200
1 1 Broadleaves 19 100
2 2 Conifer 13 300
3 3 Broadleaves 24 50
4 3 Conifer 25 100
5 3 Conifer 18 10
I want to groupby and unstack by stand id and calculate volume weighted mean height so i try:
newdf=df.groupby(['STAND_ID','Species']).mean().unstack()
Height Volume
Species Broadleaves Conifer Broadleaves Conifer
STAND_ID
1 19.0 20.0 100.0 200.0
2 NaN 13.0 NaN 300.0
3 24.0 21.5 50.0 55.0
The heights are of course not volume weighted means. How can i volume weight them? Like this for STAND_ID 3 and Conifer:
(25*100 + 18*10)/(100+10)=24.4
Upvotes: 4
Views: 13701
Reputation: 1427
If lambda functions are confusing apply
can also be used with a function definition. (And there is also a function numpy.average
to calculate weighted mean)
import numpy as np
def weighted_average(group):
weights = group['Volume']
height = group['Height']
return np.average(height,weights=weights)
df.groupby(['STAND_ID','Species']).apply(func = weighted_average).unstack()
Upvotes: 13
Reputation: 7038
If I understand correctly, one way would be performing a groupby
with apply
:
df
STAND_ID Species Height Volume
0 1 Conifer 20 200
1 1 Broadleaves 19 100
2 2 Conifer 13 300
3 3 Broadleaves 24 50
4 3 Conifer 25 100
5 3 Conifer 18 10
df.groupby(['STAND_ID','Species']).apply(lambda x: (x['Height'] * x['Volume'].div(x['Volume'].sum())).sum()).unstack()
Species Broadleaves Conifer
STAND_ID
1 19.0 20.000000
2 NaN 13.000000
3 24.0 24.363636
Upvotes: 6