Reputation: 119
I'm trying to calculate a weighted average for multiple columns in a dataframe. This is a sample of my data
Group | Year | Month | Weight(kg) | Nitrogen | Calcium |
---|---|---|---|---|---|
A | 2020 | 01 | 10000 | 10 | 70 |
A | 2020 | 01 | 15000 | 4 | 78 |
A | 2021 | 05 | 12000 | 5 | 66 |
A | 2021 | 05 | 10000 | 8 | 54 |
B | 2021 | 08 | 14000 | 10 | 90 |
C | 2021 | 08 | 50000 | 20 | 92 |
C | 2021 | 08 | 40000 | 10 | 95 |
My desired result would look something like this:
What I've tried: I can get the correct weighted average values for a single column using this function: (similar to: link)
def wavg(df, value, weight):
d = df[value]
w = df[weight]
try:
return (d * w).sum() / w.sum()
except ZeroDivisionError:
return d.mean()
I can apply this function to a single column of my df:
df2 = df.groupby(["Group", "year", "month"]).apply(wavg, "Calcium", "Weight(kg").to_frame()
(Don't mind the different values, they are correct for the data in my notebook)
The obvious problem is that this function only works for a single column whilst I have a douzens of columns. I therefore tried a for loop:
column_list=[]
for column in df.columns:
column_list.append(df.groupby(["Group", "year", "month"]).apply(wavg, column, "Weight(kg").to_frame())
It calculates the values correctly, but the columns are placed on top of each other instead of next to eachother. They also miss a usefull column name:
How could I adapt my code to return the desired df?
Upvotes: 1
Views: 1701
Reputation: 862511
Change function for working by multiple columns and for avoid removing column for grouping are converting to MultiIndex
:
def wavg(x, value, weight):
d = x[value]
w = x[weight]
try:
return (d.mul(w, axis=0)).div(w.sum())
except ZeroDivisionError:
return d.mean()
#columns used for groupby
groups = ["Group", "Year", "Month"]
#processing all another columns
cols = df.columns.difference(groups + ["Weight(kg)"], sort=False)
#create index and processing all columns by variable cols
df1 = (df.set_index(groups)
.groupby(level=groups)
.apply(wavg, cols, "Weight(kg)")
.reset_index())
print (df2)
Group Year Month Calcium Nitrogen
0 A 2020 1 28.000000 4.000000
1 A 2020 1 46.800000 2.400000
2 A 2021 5 36.000000 2.727273
3 A 2021 5 24.545455 3.636364
4 B 2021 8 90.000000 10.000000
5 C 2021 8 51.111111 11.111111
6 C 2021 8 42.222222 4.444444
Upvotes: 1
Reputation: 24314
Try via concat()
and reset_index()
:
df=pd.concat(column_list,axis=1).reset_index()
OR
you can make changes here:
column_list=[]
for column in df.columns:
column_list.append(df.groupby(["Group", "year", "month"]).apply(wavg, column, "Weight(kg").reset_index())
#Finally:
df=pd.concat(column_list,axis=1)
Upvotes: 1