Reputation: 55
First off, my apologies if my question wasn't worded the best.
I will use a sample dataframe to illustrate my question.
medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})
Suppose I have this dataframe of countries and the number of medals they've won in the Olympics:
Year Country Sex No. of medals
0 2010 Canada female 2
1 2010 Canada male 4
2 2010 USA female 2
3 2010 USA male 0
4 2010 Germany female 3
5 2010 Germany male 0
6 2014 Canada female 1
7 2014 Canada male 1
8 2014 USA female 3
9 2014 USA male 2
10 2014 Germany female 4
11 2014 Germany male 4
12 2018 Canada female 1
13 2018 Canada male 3
14 2018 USA female 2
15 2018 USA male 2
16 2018 Germany female 1
17 2018 Germany male 3
let's say I want to add a column that shows total number of medals won by the country that year:
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
How would I go about doing that? I've gotten as far as grouping by country and year and getting the sum, but I'm not sure how to map it to the Year and Country column.
medals.groupby(['Year','Country'])['No. of medals'].sum()
gives me this:
Year Country
2010 Canada 6
Germany 3
USA 2
2014 Canada 2
Germany 8
USA 5
2018 Canada 4
Germany 4
USA 4
Name: No. of medals, dtype: int64
Would really appreciate any tips and pointers. Thanks!
Upvotes: 2
Views: 44
Reputation: 308
You almost got it.
>>> medals_sum = medals.groupby(["Year", "Country"])["No. of medals"].sum().reset_index()
>>> medals_sum = medals_sum.rename(columns={"No. of medals": "Total medals"})
>>> medals.merge(medals_sum, on=["Year", "Country"])
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
Upvotes: 0
Reputation: 26676
Use groupby
transform
medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
Upvotes: 1