Reputation: 427
New to pandas and trying to get better. I would like to group_by the "id" column, then order_by "col2" then "col3", then take the top observation. Thanks in advance
In[5]:df
Out[5]:
id col2 col3
0 1 2 5
1 1 2 2
2 2 8 3
3 2 3 6
4 2 3 1
5 1 3 7
6 2 5 6
7 1 8 6
Desired:
In[6]:df_selection
Out[6]:
id col2 col3
0 1 2 2
1 2 3 1
Upvotes: 1
Views: 37
Reputation: 76297
A very short way to do this would be to sort by these columns, group, and take the first of each group:
>>> df.sort_values(by=['col2', 'col3']).groupby('id').first()
col2 col3
id
1 2 2
2 3 1
To get the exact output as your question, just reset the index:
>>> df.sort_values(by=['col2', 'col3']).groupby('id').first().reset_index()
id col2 col3
0 1 2 2
1 2 3 1
Upvotes: 1