ah bon
ah bon

Reputation: 10061

Calculate weighted average results for multiple columns based on another dataframe in Pandas

Let's say we have a students' score data df1 and credit data df2 as follows:

df1:

       stu_id       major  Python  English  C++
0  U202010521    computer      56       81   82
1  U202010522  management      92       56   64
2  U202010523  management      95       88   81
3  U202010524  BigData&AI      79       53   74
4  U202010525    computer      53       71   -1
5  U202010526    computer      78       96   53
6  U202010527  BigData&AI      69       63   74
7  U202010528  BigData&AI      86       57   82
8  U202010529  BigData&AI      81      100   85
9  U202010530  BigData&AI      79       67   80

df2:

     class  credit
0   Python       2
1  English       4
2      C++       3

I need to calculate weighted average for each students' scores.

df2['credit_ratio'] = df2['credit']/9

Out:

     class  credit  credit_ratio
0   Python       2      0.222222
1  English       4      0.444444
2      C++       3      0.333333

ie., for U202010521, his/her weighted score will be 56*0.22 + 81*0.44 + 82*0.33 = 75.02, I need to calculate each student's weighted_score as a new column, how could I do that in Pandas?

Upvotes: 4

Views: 112

Answers (2)

crayxt
crayxt

Reputation: 2405

I can do it in several steps, complete workflow is below:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(
"""stu_id       major  Python  English  C++
U202010521    computer      56       81   82
U202010522  management      92       56   64
U202010523  management      95       88   81
U202010524  BigData&AI      79       53   74
U202010525    computer      53       71   -1
U202010526    computer      78       96   53
U202010527  BigData&AI      69       63   74
U202010528  BigData&AI      86       57   82
U202010529  BigData&AI      81      100   85
U202010530  BigData&AI      79       67   80"""), sep="\s+")


df2 = pd.read_csv(StringIO(
"""class  credit
Python       2
English       4
C++       3"""), sep="\s+")

df2['credit_ratio'] = df2['credit']/9

df3 = df.melt(id_vars=["stu_id", "major"])

df3["credit_ratio"] = df3["variable"].map(df2[["class", "credit_ratio"]].set_index("class").to_dict()["credit_ratio"])
df3["G"] = df3["value"] * df3["credit_ratio"]

>>> df3.groupby("stu_id")["G"].sum()
stu_id
U202010521    75.777778
U202010522    66.666667
U202010523    87.222222
U202010524    65.777778
U202010525    43.000000
U202010526    77.666667
U202010527    68.000000
U202010528    71.777778
U202010529    90.777778
U202010530    74.000000

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35686

Try with set_index + mul then sum on axis=1:

df1['weighted_score'] = (
    df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)

df1:

       stu_id       major  Python  English  C++  weighted_score
0  U202010521    computer      56       81   82       75.777778
1  U202010522  management      92       56   64       66.666667
2  U202010523  management      95       88   81       87.222222
3  U202010524  BigData&AI      79       53   74       65.777778
4  U202010525    computer      53       71   -1       43.000000
5  U202010526    computer      78       96   53       77.666667
6  U202010527  BigData&AI      69       63   74       68.000000
7  U202010528  BigData&AI      86       57   82       71.777778
8  U202010529  BigData&AI      81      100   85       90.777778
9  U202010530  BigData&AI      79       67   80       74.000000

Explaination:

By setting the index of df2 to class, multiplication will now align correctly with the columns of df1:

df2.set_index('class')['credit_ratio']
class
Python     0.222222
English    0.444444
C++        0.333333
Name: credit_ratio, dtype: float64

Select the specific columns from df1 using the values from df2:

df1[df2['class']]
   Python  English  C++
0      56       81   82
1      92       56   64
2      95       88   81
3      79       53   74
4      53       71   -1
5      78       96   53
6      69       63   74
7      86       57   82
8      81      100   85
9      79       67   80

Multiply to apply the weights:

df1[df2['class']].mul(df2.set_index('class')['credit_ratio'])
      Python    English        C++
0  12.444444  36.000000  27.333333
1  20.444444  24.888889  21.333333
2  21.111111  39.111111  27.000000
3  17.555556  23.555556  24.666667
4  11.777778  31.555556  -0.333333
5  17.333333  42.666667  17.666667
6  15.333333  28.000000  24.666667
7  19.111111  25.333333  27.333333
8  18.000000  44.444444  28.333333
9  17.555556  29.777778  26.666667

Then sum across rows to get the total value.

df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
0    75.777778
1    66.666667
2    87.222222
3    65.777778
4    43.000000
5    77.666667
6    68.000000
7    71.777778
8    90.777778
9    74.000000
dtype: float64

Upvotes: 5

Related Questions