Reputation: 10061
Let's say we have a students' score data df1
and credit data df2
as follows:
df1:
stu_id major Python English C++
0 U202010521 computer 56 81 82
1 U202010522 management 92 56 64
2 U202010523 management 95 88 81
3 U202010524 BigData&AI 79 53 74
4 U202010525 computer 53 71 -1
5 U202010526 computer 78 96 53
6 U202010527 BigData&AI 69 63 74
7 U202010528 BigData&AI 86 57 82
8 U202010529 BigData&AI 81 100 85
9 U202010530 BigData&AI 79 67 80
df2:
class credit
0 Python 2
1 English 4
2 C++ 3
I need to calculate weighted average
for each students' scores.
df2['credit_ratio'] = df2['credit']/9
Out:
class credit credit_ratio
0 Python 2 0.222222
1 English 4 0.444444
2 C++ 3 0.333333
ie., for U202010521
, his/her weighted score will be 56*0.22 + 81*0.44 + 82*0.33 = 75.02
, I need to calculate each student's weighted_score
as a new column, how could I do that in Pandas?
Upvotes: 4
Views: 112
Reputation: 2405
I can do it in several steps, complete workflow is below:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
"""stu_id major Python English C++
U202010521 computer 56 81 82
U202010522 management 92 56 64
U202010523 management 95 88 81
U202010524 BigData&AI 79 53 74
U202010525 computer 53 71 -1
U202010526 computer 78 96 53
U202010527 BigData&AI 69 63 74
U202010528 BigData&AI 86 57 82
U202010529 BigData&AI 81 100 85
U202010530 BigData&AI 79 67 80"""), sep="\s+")
df2 = pd.read_csv(StringIO(
"""class credit
Python 2
English 4
C++ 3"""), sep="\s+")
df2['credit_ratio'] = df2['credit']/9
df3 = df.melt(id_vars=["stu_id", "major"])
df3["credit_ratio"] = df3["variable"].map(df2[["class", "credit_ratio"]].set_index("class").to_dict()["credit_ratio"])
df3["G"] = df3["value"] * df3["credit_ratio"]
>>> df3.groupby("stu_id")["G"].sum()
stu_id
U202010521 75.777778
U202010522 66.666667
U202010523 87.222222
U202010524 65.777778
U202010525 43.000000
U202010526 77.666667
U202010527 68.000000
U202010528 71.777778
U202010529 90.777778
U202010530 74.000000
Upvotes: 1
Reputation: 35686
Try with set_index
+ mul
then sum
on axis=1:
df1['weighted_score'] = (
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)
df1
:
stu_id major Python English C++ weighted_score
0 U202010521 computer 56 81 82 75.777778
1 U202010522 management 92 56 64 66.666667
2 U202010523 management 95 88 81 87.222222
3 U202010524 BigData&AI 79 53 74 65.777778
4 U202010525 computer 53 71 -1 43.000000
5 U202010526 computer 78 96 53 77.666667
6 U202010527 BigData&AI 69 63 74 68.000000
7 U202010528 BigData&AI 86 57 82 71.777778
8 U202010529 BigData&AI 81 100 85 90.777778
9 U202010530 BigData&AI 79 67 80 74.000000
Explaination:
By setting the index of df2 to class, multiplication will now align correctly with the columns of df1:
df2.set_index('class')['credit_ratio']
class
Python 0.222222
English 0.444444
C++ 0.333333
Name: credit_ratio, dtype: float64
Select the specific columns from df1
using the values from df2
:
df1[df2['class']]
Python English C++
0 56 81 82
1 92 56 64
2 95 88 81
3 79 53 74
4 53 71 -1
5 78 96 53
6 69 63 74
7 86 57 82
8 81 100 85
9 79 67 80
Multiply to apply the weights:
df1[df2['class']].mul(df2.set_index('class')['credit_ratio'])
Python English C++
0 12.444444 36.000000 27.333333
1 20.444444 24.888889 21.333333
2 21.111111 39.111111 27.000000
3 17.555556 23.555556 24.666667
4 11.777778 31.555556 -0.333333
5 17.333333 42.666667 17.666667
6 15.333333 28.000000 24.666667
7 19.111111 25.333333 27.333333
8 18.000000 44.444444 28.333333
9 17.555556 29.777778 26.666667
Then sum across rows to get the total value.
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
0 75.777778
1 66.666667
2 87.222222
3 65.777778
4 43.000000
5 77.666667
6 68.000000
7 71.777778
8 90.777778
9 74.000000
dtype: float64
Upvotes: 5