group one column according to another column and sum of third column

Question

A dataframe looks like this:

df = pd.DataFrame({'name':["a"," b", "c","d", "e","a"," a", "a"," b", "c","d", "e","a"," a"],
       'gender': ["male", "female", "female", "female", "male","male","male","female","female", 
                  "female", "male","male","male", "male"],
      'year':[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019],
      'month':[1, 12, 4, 3, 6, 7, 2, 4, 5, 1, 12, 4, 3, 6 ],
      'count':[100, 30, 10, 90, 34, 100, 30, 10, 90, 34, 100, 30, 10, 90]})

It shows the name, gender, birth year and the birth month, and the number of people.

For example, in 2005 January there were 100 babies named “a”. I want to find top-10 frequent names for males and females, just as below:

df.sort_values(['gender','count'],ascending=False,inplace=True)
male = df[df['gender']=='male']['name'].head(10).to_list()
female = df[df['gender']=='female']['name'].head(10).to_list()
results = pd.DataFrame({'Male':pd.Series(male),'Female':pd.Series(female)})
print (results)

but apparently it gives duplicated amounts for example 5 a instead of total amount for a.I need to sum them up for example we have 100 'a' in 2005 and 100 'a' in 2009 and so on. so in total we need to add them up like 100+ 100 and.the final result must be like a is male and totally in 10 years 500 people were named a and so on.

KaRaOkEy · Accepted Answer

I think you are looking for a multiple groupby:

by_year = df.groupby(['year', 'gender', 'name']).agg({'count': 'sum'})
print(by_year)

total = by_year.groupby(['gender', 'name']).agg({'count': 'sum'})
print(total)

If not, please further specify an expected result.

group one column according to another column and sum of third column

Answers (2)

Related Questions