Reputation: 175
I have a Pandas DataFrame that looks like this:
index | ID | value_1 | value_2 |
---|---|---|---|
0 | 1 | 200 | 126 |
1 | 1 | 200 | 127 |
2 | 1 | 200 | 128.1 |
3 | 1 | 200 | 125.7 |
4 | 2 | 300.1 | 85 |
5 | 2 | 289.4 | 0 |
6 | 2 | 0 | 76.9 |
7 | 2 | 199.7 | 0 |
My aim is to find all rows in each ID
-group (1,2 in this example) which have the max value for value_1
column. The second condition is if there are multiple maximum values per group, the row where the value in column value_2
is maximum should be taken.
So the target table should look like this:
index | ID | value_1 | value_2 |
---|---|---|---|
0 | 1 | 200 | 128.1 |
1 | 2 | 300.1 | 85 |
Upvotes: 1
Views: 112
Reputation: 863801
Use DataFrame.sort_values
by all 3 columns and then DataFrame.drop_duplicates
:
df1 = (df.sort_values(['ID', 'value_1', 'value_2'], ascending=[True, False, False])
.drop_duplicates('ID'))
print (df1)
ID value_1 value_2
2 1 200.0 128.1
4 2 300.1 85.0
Upvotes: 2