Reputation: 1114
I've got a pandas series called df['inc_cr_date']
that has dates in format 2017-10-27 08:00:26.808
.
I would like to compare if date and hour of this column is higher than 18:30. The problem is that if I use the below code:
#All the condtions can be reduced to one mask and result
days_one = ['Monday','Tuesday','Wednesday','Thursday']
days_two = days_one + ['Friday']
# Returns a boolean mask
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].dt.hour < 9 ) & (df['inc_cr_date'].dt.minute < 30)
# Repeated result can be stored in one variable
r1 = (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
r2 = (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
df['inc_cr_date_adjusted'] = np.select([
m1, m2,
(df['inc_cr_date_day'] == 'Saturday'),
(df['inc_cr_date_day'] == 'Sunday'),
((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
],
[r1, r2,
(df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
],
df['inc_cr_date'])
this will get me ambiguity results as first condition might be true but second could not be. How could I compare the ['inc_cr_date'] column with 18:30 and NOT with minutes and hour separated?
I've tried to use also pandas.DataFrame.between_time as below:
start = datetime.time(18,30,0)
end = datetime.time(23,59,0)
df['inc_cr_date'].between_time(start, end) )
But getting the error:
TypeError: Index must be DatetimeIndex
Even if I do:
df['inc_cr_date'] = pd.DatetimeIndex(df['inc_cr_date'])
Dataframe is like below:
inc_cr_date inc_cr_date_day
0 2017-10-26 21:59:28.075 Thursday 2017-10-27
1 2017-10-21 16:49:58.722 Saturday 2017-10-23
2 2017-10-11 09:30:05.258 Wednesday 2017-10-11
Output should be like:
inc_cr_date inc_cr_date_day inc_cr_date_adjusted
0 2017-10-26 21:20:28.075 Thursday 2017-10-27 09:30:00.000
1 2017-10-21 16:49:58.722 Saturday 2017-10-23 09:30:00.000
2 2017-10-11 09:30:05.258 Wednesday 2017-10-11 09:30:05.258
Please advise. Thanks.
Upvotes: 1
Views: 1001
Reputation: 806
How about this?
df['timeFlag'] = df['inc_cr_date'].apply(lambda x: 1 if x.time() > datetime.time(18, 30, 0) else 0)
In your example, you can change these lines:
# Returns a boolean mask
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].dt.hour < 9 ) & (df['inc_cr_date'].dt.minute < 30)
to:
# Returns a boolean mask
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].apply(lambda x: 1 if x.time() > datetime.time(18, 30,0) else 0))
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].apply(lambda x: 1 if x.time() > datetime.time(18, 30,0) else 0))
Upvotes: 1