Reputation: 1558
I need to create a column, for every 24-hours interval, the first date record should contain the max value of df.score
.
So the new column, daily_max_score
, should contain the df.score.max()
just once (on the first daily record, usually at 00:00:00, but not always at that hour and minute).
import pandas as pd
df = pd.DataFrame({
'date': ['2019-04-19 23:00:00','2019-04-20 00:00:00','2019-04-20 01:00:00', '2019-04-05 08:00:00',
'2019-07-31 23:30:00','2019-08-01 00:00:00','2019-08-01 01:00:00', '2019-08-01 02:00:00'],
'ID': ['ID F', 'ID F', 'ID F', 'ID F',
'ID B', 'ID B', 'ID B', 'ID B'],
'score': ['50', '100', '99', '99',
'75', '25', '25', '80']})
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df
date ID score
0 2019-04-19 23:00:00 ID F 50
1 2019-04-20 00:00:00 ID F 100
2 2019-04-20 01:00:00 ID F 99
3 2019-04-05 08:00:00 ID F 99
4 2019-07-31 23:30:00 ID B 75
5 2019-08-01 00:00:00 ID B 25
6 2019-08-01 01:00:00 ID B 25
7 2019-08-01 02:00:00 ID B 80
Desired DataFrame:
date ID score daily_max_score
0 2019-04-19 23:00:00 ID F 50 50
1 2019-04-20 00:00:00 ID F 100 100
2 2019-04-20 01:00:00 ID F 99 NaN
3 2019-04-05 08:00:00 ID F 99 99
4 2019-07-31 23:30:00 ID B 75 75
5 2019-08-01 00:00:00 ID B 25 80
6 2019-08-01 01:00:00 ID B 25 NaN
7 2019-08-01 02:00:00 ID B 80 NaN
Upvotes: 1
Views: 191
Reputation: 109520
First, convert your text date values into actual datetimes via df['date'] = pd.to_datetime(df['date'])
.
Then group on the dates via df.groupby(df['date'].dt.date)
.
Get the max daily score for each date, first converting the text values to integers for the purpose of getting the max value (vs. max lex sorted value). Convert the result back to a string.
Create a dataframe of the daily max scores, using the index location of the minimum datetime for each date, i.e. index=gb['date'].idxmin()
.
This dataframe of the indexed daily_max_score
can now be joined to the original dataframe (which joins on index by default), producing the desired result.
Note that both the score
and daily_max_score
are still strings (i.e. objects), which is how they were originally assigned and thus inferred to be the desired type.
df['date'] = pd.to_datetime(df['date'])
gb = df.groupby(df['date'].dt.date)
max_daily_scores = gb['score'].apply(lambda x: x.astype(int).max()).astype(str)
daily_max_score = pd.DataFrame(
max_daily_scores.tolist(),
index=gb['date'].idxmin(),
columns=['daily_max_score']
)
>>> df.join(daily_max_score)
date ID score daily_max_score
0 2019-04-19 23:00:00 ID F 50 50
1 2019-04-20 00:00:00 ID F 100 100
2 2019-04-20 01:00:00 ID F 99 NaN
3 2019-04-05 08:00:00 ID F 99 99
4 2019-07-31 23:30:00 ID B 75 75
5 2019-08-01 00:00:00 ID B 25 80
6 2019-08-01 01:00:00 ID B 25 NaN
7 2019-08-01 02:00:00 ID B 80 NaN
Upvotes: 1