Reputation: 3485
I have a dataframe with some timeseries inside. It has 3 columns: day, hour and value:
day | hour | value |
---|---|---|
12-Jan | 11-00 | 14 |
12-Jan | 12-00 | 100 |
12-Jan | 13-00 | 345 |
12-Jan | 14-00 | 195 |
13-Jan | 12-00 | 76 |
13-Jan | 13-00 | 221 |
13-Jan | 14-00 | 102 |
13-Jan | 15-00 | 395 |
As you see max value for 12 Jan is observed at 13-00.
I want to calculate max of the "value" for each "day". I can do it via simple
df.groupby("day")["value"].max()
It works, but after grouping apparently we have hour information erased. The question is : how can I build dataFrame which would contain day max value along with the hour when that value was observed, i.e.
day | hour when maxValue was observed | maxValue |
---|---|---|
12-Jan | 13-00 | 345 |
13-Jan | 15-00 | 395 |
?
Upvotes: 1
Views: 1269
Reputation: 13821
EDIT
I created a sample of your df
:
day hour value
0 2021-01-12 11-00 14
1 2021-01-12 12-00 100
2 2021-01-12 13-00 345
3 2021-01-12 14-00 195
4 2021-01-13 12-00 76
5 2021-01-13 13-00 221
6 2021-01-13 14-00 102
7 2021-01-13 15-00 395
And run this code on it:
res = pd.merge(df.groupby('day').agg({'value':'max'},as_index=False).add_prefix('max_'),df,how='left',left_on='max_value',right_on='value')
And got back:
max_value day hour value
0 345 2021-01-12 13-00 345
1 395 2021-01-13 15-00 395
Upvotes: 2