Starbucks
Starbucks

Reputation: 1558

Python - Pandas - Insert Single Value on One Row per Date

I need to create a column, for every 24-hours interval, the first date record should contain the max value of df.score.

So the new column, daily_max_score, should contain the df.score.max() just once (on the first daily record, usually at 00:00:00, but not always at that hour and minute).

import pandas as pd

df = pd.DataFrame({
    'date': ['2019-04-19 23:00:00','2019-04-20 00:00:00','2019-04-20 01:00:00', '2019-04-05 08:00:00',
             '2019-07-31 23:30:00','2019-08-01 00:00:00','2019-08-01 01:00:00', '2019-08-01 02:00:00'],
    'ID': ['ID F', 'ID F', 'ID F', 'ID F',
             'ID B', 'ID B', 'ID B', 'ID B'],
    'score': ['50', '100', '99', '99',
             '75', '25', '25', '80']})

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

df

date    ID  score
0   2019-04-19 23:00:00 ID F    50
1   2019-04-20 00:00:00 ID F    100
2   2019-04-20 01:00:00 ID F    99
3   2019-04-05 08:00:00 ID F    99
4   2019-07-31 23:30:00 ID B    75
5   2019-08-01 00:00:00 ID B    25
6   2019-08-01 01:00:00 ID B    25
7   2019-08-01 02:00:00 ID B    80

Desired DataFrame:

date                    ID      score   daily_max_score
0   2019-04-19 23:00:00 ID F    50      50
1   2019-04-20 00:00:00 ID F    100     100
2   2019-04-20 01:00:00 ID F    99      NaN
3   2019-04-05 08:00:00 ID F    99      99
4   2019-07-31 23:30:00 ID B    75      75
5   2019-08-01 00:00:00 ID B    25      80
6   2019-08-01 01:00:00 ID B    25      NaN
7   2019-08-01 02:00:00 ID B    80      NaN

Upvotes: 1

Views: 191

Answers (1)

Alexander
Alexander

Reputation: 109520

First, convert your text date values into actual datetimes via df['date'] = pd.to_datetime(df['date']).

Then group on the dates via df.groupby(df['date'].dt.date).

Get the max daily score for each date, first converting the text values to integers for the purpose of getting the max value (vs. max lex sorted value). Convert the result back to a string.

Create a dataframe of the daily max scores, using the index location of the minimum datetime for each date, i.e. index=gb['date'].idxmin().

This dataframe of the indexed daily_max_score can now be joined to the original dataframe (which joins on index by default), producing the desired result.

Note that both the score and daily_max_score are still strings (i.e. objects), which is how they were originally assigned and thus inferred to be the desired type.

df['date'] = pd.to_datetime(df['date'])
gb = df.groupby(df['date'].dt.date)
max_daily_scores = gb['score'].apply(lambda x: x.astype(int).max()).astype(str)
daily_max_score = pd.DataFrame(
    max_daily_scores.tolist(), 
    index=gb['date'].idxmin(), 
    columns=['daily_max_score']
)

>>> df.join(daily_max_score)
                 date    ID score daily_max_score
0 2019-04-19 23:00:00  ID F    50              50
1 2019-04-20 00:00:00  ID F   100             100
2 2019-04-20 01:00:00  ID F    99             NaN
3 2019-04-05 08:00:00  ID F    99              99
4 2019-07-31 23:30:00  ID B    75              75
5 2019-08-01 00:00:00  ID B    25              80
6 2019-08-01 01:00:00  ID B    25             NaN
7 2019-08-01 02:00:00  ID B    80             NaN

Upvotes: 1

Related Questions