Reputation: 3
We have a df called allyears that has years, gender, names in it.
Something like this:
name | sex | number | year |
---|---|---|---|
John | M | 1 | 2010 |
Jane | F | 2 | 2011 |
I want to group the top10 names for a given year with their respective counts. I tried this code, but this is not returning what I am looking for.
males = allyears[(allyears.year>2009)&(allyears.sex=='M')]
maleNameCounts = pd.DataFrame(males.groupby(['year', 'name']).count())
maleNameCounts.sort_values('number', ascending=True)
How should I be approaching this problem?
Upvotes: 0
Views: 65
Reputation: 632
You can try following:
males = allyears[(allyears.year>2009)&(allyears.sex=='M'),]
maleNameCounts = df.groupby(['Year', 'Name']).size().nlargest(10).reset_index().rename(columns={0:'count'})
maleNameCounts
Upvotes: 1
Reputation: 1100
Hope this helps:
Add a column with counts
df["name_count"] = df[name].map(df.name.value_counts())
Optional to remove duplicates
df = df.drop_duplicates(["name"])
Sort (by counts)
df = df.sort_values("name_count")
Note that this can all be tweaked were necessary.
Upvotes: 1