Reputation: 135
I have a variable like this in pandas that list specific time
Time
19:05
20:18
20:44
20:51
06:11
06:23
08:37
11:59
What I'd like to have is a column with the adjusted rounded time (eg 11:59 -> 12:00 or 19:05 -> 19:00) and another column where the hours between 22:00 till 06:00 are considered "night" and the hours between 07:00 and 21:00 are considered "day"
Time Time_rnd DN
19:05 19:00 Day
20:18 20:00 Day
20:44 21:00 Day
20:51 21:00 Day
23:34 23:00 Night
03:11 03:00 Night
04:23 04:00 Night
08:37 09:00 Day
11:59 12:00 Day
Is it possible to do something like this? Thank you
Upvotes: 0
Views: 387
Reputation: 78
import pandas as pd
df_time = pd.DataFrame(
[
'19:05',
'20:18',
'20:44',
'20:51',
'06:11',
'06:23',
'08:37',
'11:59',
],
columns=['time']
)
# Convert the hour-strings into date-time. Set as index.
df_time.index = pd.to_datetime(df_time['time']).dt.round('H')
# Create new column--default value "Night"
df_time['time of day'] = 'Night'
# Filter for items between specified times--set to "Day"
df_time.loc[df_time.between_time('07:00', '21:00').index, 'time of day'] = 'Day'
time time of day
time
2021-07-10 19:00:00 19:05 Day
2021-07-10 20:00:00 20:18 Day
2021-07-10 21:00:00 20:44 Day
2021-07-10 21:00:00 20:51 Day
2021-07-10 06:00:00 06:11 Night
2021-07-10 06:00:00 06:23 Night
2021-07-10 09:00:00 08:37 Day
2021-07-10 12:00:00 11:59 Day
It's a different approach than specifically what you've outlined, but using ts.between_time()
appears more elegant, IMO.
Edit: OP Asked about removing the dates from the dataframe, since they weren't in the original prompt. It appears they would best be removed from the index as well.
Adding the following to the bottom of the script above will accomplish this:
# change the index to a formatted string, to remove the date
df_time.index = pd.Index(df_time.index.strftime('%H:%M'), name='rounded time')
# reset the index to remove the stringified datetime out of the index
df_time = df_time.reset_index()
rounded time time time of day
0 19:00 19:05 Day
1 20:00 20:18 Day
2 21:00 20:44 Day
3 21:00 20:51 Day
4 06:00 06:11 Night
5 06:00 06:23 Night
6 09:00 08:37 Day
7 12:00 11:59 Day
Upvotes: 3
Reputation: 7353
If you want to use your own logic to process the data without using any datetime specific function, here's how you could do it.
Assumption: Your
Time
column should have all entries asstr
. If not, convert that column into string withdf['Time'].astype(str)
and then pass it to the functionprepare_data(df = df['Time'].astype(str))
.
import pandas as pd
def prepare_data(df: pd.DataFrame,
time_column: str = 'Time',
day_time_start: int = 7,
day_time_end: int = 23,
prune: bool = True):
"""Returns processed data (as a dataframe) for a given time-column."""
_df2 = (df[time_column]
.str.split(':', expand=True)
.rename(columns={0: 'HH', 1: 'MM'})
.astype(int))
dfx = pd.concat([df[time_column], _df2], axis=1)
dfx['Hour'] = dfx.HH + dfx.MM.between(30, 59)
dfx['Hour'] = dfx.Hour + (dfx.Hour // 24) * (-24)
dfx['AdjustedTime'] = dfx.Hour.astype(str).apply(lambda x: x.zfill(2)) + ':00'
dfx['DayOrNight'] = (dfx.Hour
.between(day_time_start, day_time_end)
.apply(lambda x: 'Day' if x else 'Night'))
if prune:
return dfx.loc[:, [time_column, 'AdjustedTime', 'DayOrNight']]
return dfx
dfx = prepare_data(df, time_column='Time')
dfx
import pandas as pd
df = pd.DataFrame(
[
'19:05',
'20:18',
'20:44',
'20:51',
'06:11',
'06:23',
'08:37',
'11:59',
'02:22',
'23:58',
'23:18',
'03:35',
],
columns=['Time'],
)
Upvotes: 0