chinskiy
chinskiy

Reputation: 2715

Group by values across two columns and filter in Pandas

I have a DataFrame like this:

    name    sex births  year
0   Mary    F   7433    2000
1   John    M   6542    2000
2   Emma    F   2342    2000
3   Ron     M   5432    2001
4   Bessie  F   4234    2001
5   Jennie  F   2413    2002
6   Nick    M   2343    2002
7   Ron     M   4342    2002

I need to get new DataFrame where data will be grouped by year and sex, and last two columns will be name with max births and max (births) value, like this:

    year   sex  name     births
0   2000   F    Mary     7433
1   2000   M    John     6542
2   2001   F    Bessie   4234
3   2001   M    Ron      5432   
4   2002   F    Jennie   2413
5   2002   M    Ron      4342

Upvotes: 2

Views: 2308

Answers (1)

Alex Riley
Alex Riley

Reputation: 176938

It can be done using the following groupby operation:

>>> df.groupby(['year', 'sex'], as_index=False).max()
   year sex    name  births
0  2000   F    Mary    7433
1  2000   M    John    6542
2  2001   F  Bessie    4234
3  2001   M     Ron    5432
4  2002   F  Jennie    2413
5  2002   M     Ron    4342

as_index=False stops the groupby keys from becoming the index in the returned DataFrame.

Alternatively, to get the desired output you may need to to sort the 'births' column and then use groupby.first():

df = df.sort_values(by='births', ascending=False)
df.groupby(['year', 'sex'], as_index=False).first()

Upvotes: 4

Related Questions