Roooobz
Roooobz

Reputation: 55

How can I create a new column of values based on the grouped sum of values from two other columns?

First off, my apologies if my question wasn't worded the best.

I will use a sample dataframe to illustrate my question.

medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})

Suppose I have this dataframe of countries and the number of medals they've won in the Olympics:

    Year  Country     Sex  No. of medals
0   2010   Canada  female              2
1   2010   Canada    male              4
2   2010      USA  female              2
3   2010      USA    male              0
4   2010  Germany  female              3
5   2010  Germany    male              0
6   2014   Canada  female              1
7   2014   Canada    male              1
8   2014      USA  female              3
9   2014      USA    male              2
10  2014  Germany  female              4
11  2014  Germany    male              4
12  2018   Canada  female              1
13  2018   Canada    male              3
14  2018      USA  female              2
15  2018      USA    male              2
16  2018  Germany  female              1
17  2018  Germany    male              3 

let's say I want to add a column that shows total number of medals won by the country that year:

    Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

How would I go about doing that? I've gotten as far as grouping by country and year and getting the sum, but I'm not sure how to map it to the Year and Country column.

medals.groupby(['Year','Country'])['No. of medals'].sum()

gives me this:

Year  Country
2010  Canada     6
      Germany    3
      USA        2
2014  Canada     2
      Germany    8
      USA        5
2018  Canada     4
      Germany    4
      USA        4
Name: No. of medals, dtype: int64

Would really appreciate any tips and pointers. Thanks!

Upvotes: 2

Views: 44

Answers (2)

PiyushC
PiyushC

Reputation: 308

You almost got it.

>>> medals_sum = medals.groupby(["Year", "Country"])["No. of medals"].sum().reset_index()
>>> medals_sum = medals_sum.rename(columns={"No. of medals": "Total medals"})
>>> medals.merge(medals_sum, on=["Year", "Country"])

    Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

Upvotes: 0

wwnde
wwnde

Reputation: 26676

Use groupby transform

medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)




  Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

Upvotes: 1

Related Questions