Diana Mele
Diana Mele

Reputation: 135

Round and groupby hour values in pandas

I have a variable like this in pandas that list specific time

Time
19:05
20:18
20:44
20:51
06:11
06:23
08:37
11:59

What I'd like to have is a column with the adjusted rounded time (eg 11:59 -> 12:00 or 19:05 -> 19:00) and another column where the hours between 22:00 till 06:00 are considered "night" and the hours between 07:00 and 21:00 are considered "day"

Time    Time_rnd     DN
19:05     19:00      Day
20:18     20:00      Day
20:44     21:00      Day
20:51     21:00      Day
23:34     23:00      Night
03:11     03:00      Night
04:23     04:00      Night
08:37     09:00      Day
11:59     12:00      Day

Is it possible to do something like this? Thank you

Upvotes: 0

Views: 387

Answers (2)

Walter Zielenski
Walter Zielenski

Reputation: 78

import pandas as pd

df_time = pd.DataFrame(
    [
        '19:05',
        '20:18',
        '20:44',
        '20:51',
        '06:11',
        '06:23',
        '08:37',
        '11:59',
    ],
    columns=['time']
)

# Convert the hour-strings into date-time. Set as index.
df_time.index = pd.to_datetime(df_time['time']).dt.round('H')

# Create new column--default value "Night"
df_time['time of day'] = 'Night'

# Filter for items between specified times--set to "Day"
df_time.loc[df_time.between_time('07:00', '21:00').index, 'time of day'] = 'Day'
                    time    time of day
time        
2021-07-10 19:00:00 19:05   Day
2021-07-10 20:00:00 20:18   Day
2021-07-10 21:00:00 20:44   Day
2021-07-10 21:00:00 20:51   Day
2021-07-10 06:00:00 06:11   Night
2021-07-10 06:00:00 06:23   Night
2021-07-10 09:00:00 08:37   Day
2021-07-10 12:00:00 11:59   Day

It's a different approach than specifically what you've outlined, but using ts.between_time() appears more elegant, IMO.


Edit: OP Asked about removing the dates from the dataframe, since they weren't in the original prompt. It appears they would best be removed from the index as well.

Adding the following to the bottom of the script above will accomplish this:

# change the index to a formatted string, to remove the date
df_time.index = pd.Index(df_time.index.strftime('%H:%M'), name='rounded time')

# reset the index to remove the stringified datetime out of the index
df_time = df_time.reset_index()
    rounded time    time    time of day
0   19:00           19:05   Day
1   20:00           20:18   Day
2   21:00           20:44   Day
3   21:00           20:51   Day
4   06:00           06:11   Night
5   06:00           06:23   Night
6   09:00           08:37   Day
7   12:00           11:59   Day

Upvotes: 3

CypherX
CypherX

Reputation: 7353

An Alternative

If you want to use your own logic to process the data without using any datetime specific function, here's how you could do it.

Assumption: Your Time column should have all entries as str. If not, convert that column into string with df['Time'].astype(str) and then pass it to the function prepare_data(df = df['Time'].astype(str)).

import pandas as pd

def prepare_data(df: pd.DataFrame,
                 time_column: str = 'Time',
                 day_time_start: int = 7,
                 day_time_end: int = 23,
                 prune: bool = True):
    """Returns processed data (as a dataframe) for a given time-column."""

    _df2 = (df[time_column]
                .str.split(':', expand=True)
                .rename(columns={0: 'HH', 1: 'MM'})
                .astype(int))
    dfx = pd.concat([df[time_column], _df2], axis=1)
    dfx['Hour'] = dfx.HH + dfx.MM.between(30, 59)
    dfx['Hour'] = dfx.Hour + (dfx.Hour // 24) * (-24)
    dfx['AdjustedTime'] = dfx.Hour.astype(str).apply(lambda x: x.zfill(2)) + ':00'
    dfx['DayOrNight'] = (dfx.Hour
                         .between(day_time_start, day_time_end)
                         .apply(lambda x: 'Day' if x else 'Night'))
    if prune:
        return dfx.loc[:, [time_column, 'AdjustedTime', 'DayOrNight']]
    return dfx

dfx = prepare_data(df, time_column='Time')
dfx

enter image description here

Dummy Data

import pandas as pd

df = pd.DataFrame(
    [
        '19:05',
        '20:18',
        '20:44',
        '20:51',
        '06:11',
        '06:23',
        '08:37',
        '11:59',
        '02:22',
        '23:58',
        '23:18',
        '03:35',
    ],
    columns=['Time'],
)

Upvotes: 0

Related Questions