Reputation: 2715
I have a DataFrame like this:
name sex births year
0 Mary F 7433 2000
1 John M 6542 2000
2 Emma F 2342 2000
3 Ron M 5432 2001
4 Bessie F 4234 2001
5 Jennie F 2413 2002
6 Nick M 2343 2002
7 Ron M 4342 2002
I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this:
year sex name births
0 2000 F Mary 7433
1 2000 M John 6542
2 2001 F Bessie 4234
3 2001 M Ron 5432
4 2002 F Jennie 2413
5 2002 M Ron 4342
Upvotes: 2
Views: 2308
Reputation: 176938
It can be done using the following groupby
operation:
>>> df.groupby(['year', 'sex'], as_index=False).max()
year sex name births
0 2000 F Mary 7433
1 2000 M John 6542
2 2001 F Bessie 4234
3 2001 M Ron 5432
4 2002 F Jennie 2413
5 2002 M Ron 4342
as_index=False
stops the groupby keys from becoming the index in the returned DataFrame.
Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first()
:
df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()
Upvotes: 4