Reputation: 139
I am looking to group multiple columns in a dataframe, keep only the max Value, and keeping the corresponding date column
Below is how the dataframe looks like:
Index | Site | Device Type | Value | Time |
---|---|---|---|---|
0 | AAA | A | 10 | 2021-02-02 01:30:00 |
1 | AAA | A | 5 | 2021-02-02 01:35:00 |
2 | AAA | B | 2 | 2021-02-02 01:40:00 |
3 | BBB | C | 3 | 2021-02-02 02:00:00 |
4 | BBB | C | 11 | 2021-02-02 02:05:00 |
5 | BBB | C | 20 | 2021-02-02 02:10:00 |
6 | BBB | D | 30 | 2021-02-02 04:00:00 |
I am trying to get the following output:
Index | Site | Device Type | Value | Time |
---|---|---|---|---|
0 | AAA | A | 10 | 2021-02-02 01:30:00 |
1 | AAA | B | 2 | 2021-02-02 01:40:00 |
2 | BBB | C | 20 | 2021-02-02 02:10:00 |
3 | BBB | D | 30 | 2021-02-02 04:00:00 |
When I try the following groupby, the Time column drops:
df_max = df.groupby(['Site','Device Type'],as_index=False)['Value'].max()
I am looking to Keep the Time value corresponding to the maximum value found.
Thank you
Upvotes: 7
Views: 8896
Reputation: 13349
You are very close. try using idxmax
and show rows at that location:
df.loc[df.groupby(['Site','Device Type'])['Value'].idxmax()].reset_index(drop=True)
Index Site Device Type Value Time
0 0 AAA A 10 2021-02-02 01:30:00
1 2 AAA B 2 2021-02-02 01:40:00
2 5 BBB C 20 2021-02-02 02:10:00
3 6 BBB D 30 2021-02-02 04:00:00
Upvotes: 11