Reputation: 989
I have a dataset with several Oscar winners. I have the following columns: Name of winner, award, place of birth, date of birth and year. I want to check how many rows are filled per year. Let's say for 2005 we have the winner of best director and best actor and for 2006 we have the winner for best supporting actor. I want to get something like this as the result:
year_of_award number of rows
2005 2
2006 1
It looks something so simple, but I can't get it right. Most posts I found would recommend the combination of group by with count(). However, when I write the code below, I get the number of rows for all columns. So I have the year and other 4 columns filled with the number of rows.
df.groupby(['year_of_award']).count()
How can I get just the year and the number of rows?
Upvotes: 1
Views: 3257
Reputation: 14124
Try for pandas 0.25+
df.groupby(['year_of_award']).agg(number_of_rows=('award': 'count'))
else
df.groupby(['year_of_award']).agg({'award': 'count'}).rename(columns={'count': 'number_of_rows'})
Upvotes: 2