Reputation: 445
I have a Pandas dataframe with datetime64 column.
date
2022-01-30 10:57:00
2022-01-30 12:02:00
2022-01-30 18:58:00
2022-01-30 19:27:00
2022-01-30 12:00:00
2022-01-31 02:10:00
2022-01-31 01:55:00
2022-01-31 01:24:00
2022-01-31 02:37:00
...
What I need is to set working shifts (I don't know if shift is correct for turno in spanish...) with this conditions
to store it in a new column:
date shift
2022-01-30 10:57:00 Morning
2022-01-30 12:02:00 Morning
2022-01-30 18:58:00 Afternoon
2022-01-30 19:27:00 Afternoon
2022-01-30 12:00:00 Morning
2022-01-31 02:10:00 Night
2022-01-31 01:55:00 Night
2022-01-31 01:24:00 Night
...
I have done this with this code (it works fine).
startTime = pd.to_datetime(entradaDf['startDate']).dt.strftime('%H:%M:%S')
condShift = [
startTime.ge('05:00:00') & startTime.lt('13:30:00'), # Morning
startTime.ge('13:30:00') & startTime.lt('22:00:00'), # Afternoon
startTime.ge('22:00:00') & startTime.lt('00:00:00'), # Night
startTime.ge('00:00:00') & startTime.lt('05:00:00'), # Night
]
shiftValues = ['Mañana', 'Tarde', 'Noche', 'Noche']
df['shift'] = np.select(condShift , shiftValues , default='Night')
It work fine but, is there a way to do this more efficiently?
Upvotes: 1
Views: 539
Reputation: 120479
You can use np.select
:
import numpy as np
from datetime import time
df['date'] = pd.to_datetime(df['date'])
condlist = [df['date'].dt.time.between(time(5), time(13, 30), inclusive='left'),
df['date'].dt.time.between(time(13, 30), time(22), inclusive='left')]
choicelist = ['Mañana', 'Tarde']
df['shift'] = np.select(condlist=condlist, choicelist=choicelist, default='Noche')
Output:
>>> df
date shift
0 2022-01-30 10:57:00 Mañana
1 2022-01-30 12:02:00 Mañana
2 2022-01-30 18:58:00 Tarde
3 2022-01-30 19:27:00 Tarde
4 2022-01-30 12:00:00 Mañana
5 2022-01-31 02:10:00 Noche
6 2022-01-31 01:55:00 Noche
7 2022-01-31 01:24:00 Noche
8 2022-01-31 02:37:00 Noche
Upvotes: 4
Reputation: 30609
You can use cut
on the time part of the dates, converted to seconds from midnight:
df['shift'] = pd.cut(df.date.dt.hour * 3600 + df.date.dt.minute * 60 + df.date.dt.second,
[0, 5 * 3600, (13 * 60 + 30) * 60 , 22 * 3600, 24 * 3600],
right=False,
labels=['Noche', 'Mañana', 'Tarde', 'Noche'],
ordered=False)
Result:
date shift
0 2022-01-30 10:57:00 Mañana
1 2022-01-30 12:02:00 Mañana
2 2022-01-30 18:58:00 Tarde
3 2022-01-30 19:27:00 Tarde
4 2022-01-30 12:00:00 Mañana
5 2022-01-31 02:10:00 Noche
6 2022-01-31 01:55:00 Noche
7 2022-01-31 01:24:00 Noche
8 2022-01-31 02:37:00 Noche
Upvotes: 2
Reputation: 3076
I always use pandas' apply()
function to transform a row or column into some new row or column. You can use something like this:
from io import StringIO
import pandas as pd
import numpy as np
# Create your data frame
input_string = """startDate
2022-01-30 10:57:00
2022-01-30 12:02:00
2022-01-30 18:58:00
2022-01-30 19:27:00
2022-01-30 12:00:00
2022-01-31 02:10:00
2022-01-31 01:55:00
2022-01-31 01:24:00
2022-01-31 02:37:00"""
data = StringIO(input_string)
entradaDf = pd.read_csv(data, sep=' ', engine='python')
# Create series with time only
time_series = pd.to_datetime(entradaDf['startDate']).dt.strftime('%H:%M:%S')
# Function defining what shift we have
def defineShift(startTime):
if startTime >= '05:00:00' and startTime < '13:30:00':
return 'Mañana'
if startTime >= '13:30:00' and startTime < '22:00:00':
return 'Tarde'
return 'Noche'
# Apply transformation of time to shift and save in the df
entradaDf['newShift'] = time_series.apply(lambda x: defineShift(x))
entradaDf
Output:
startDate newShift
0 2022-01-30 10:57:00 Mañana
1 2022-01-30 12:02:00 Mañana
2 2022-01-30 18:58:00 Tarde
3 2022-01-30 19:27:00 Tarde
4 2022-01-30 12:00:00 Mañana
5 2022-01-31 02:10:00 Noche
6 2022-01-31 01:55:00 Noche
7 2022-01-31 01:24:00 Noche
8 2022-01-31 02:37:00 Noche
Upvotes: 0