Reputation: 1114
I am creating a new column ['inc_cr_date_adjusted'] of a df based on 2 columns of the df, the inc_cr_date_day and inc_cr_date but the the below code it's not working as expected. Doesn't give error but it's not working as defined by the conditions as only the conditions for Saturday and Sunday work. The conditions that have included hour & time do not work as the days are not summing up (Timedelta('1 days'). Other issue is that most of the dates should assume the last line of code "df['inc_cr_date'])" and do not change at all, but thats not appening also.
My conditions are based on the Day of the week and hour & minute visible in column inc_cr_date. hour&minute is 9:30 and 18:30 which is divided using the &.
Code is:
df['inc_cr_date_day'] = df['inc_cr_date'].dt.weekday_name
df['inc_cr_date_adjusted'] = np.select([(df['inc_cr_date_day'] == 'Saturday'),#condition working
(df['inc_cr_date_day'] == 'Sunday'),#condition working
((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30)],
[(df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')],
df['inc_cr_date'])
Output (that is wrong):
inc_cr_date,inc_cr_date_day,inc_cr_date_adjusted
2017-10-26 21:59:28.075,Thursday,2017-10-26 09:30:00.000 #nok, adjusted should be 2017-10-27 and not 2017-10-26.
2017-10-21 16:49:58.722,Saturday,2017-10-23 09:30:00.000 #ok
2017-10-11 09:30:05.258,Wednesday,2017-10-11 09:30:00.000 #nok, in such situation the adjusted date should be same as inc_cr_date
thanks a lot for your inputs.
Upvotes: 0
Views: 80
Reputation: 30605
As a programmer we should try to minimize as many repetitions as possible ( Following DRY principle) . We can make use of .isin
to get your desired result i.e
#All the condtions can be reduced to one mask and result
days_one = ['Monday','Tuesday','Wednesday','Thursday']
days_two = days_one + ['Friday']
# Returns a boolean mask
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].dt.hour < 9 ) & (df['inc_cr_date'].dt.minute < 30)
# Repeated result can be stored in one variable
r1 = (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
r2 = (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
df['inc_cr_date_adjusted'] = np.select([
m1, m2,
(df['inc_cr_date_day'] == 'Saturday'),
(df['inc_cr_date_day'] == 'Sunday'),
((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
],
[r1, r2,
(df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
(df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
],
df['inc_cr_date'])
Output :
inc_cr_date inc_cr_date_day inc_cr_date_adjusted 0 2017-10-26 21:59:28.075 Thursday 2017-10-27 09:30:00.000 1 2017-10-21 16:49:58.722 Saturday 2017-10-23 09:30:00.000 2 2017-10-11 09:30:05.258 Wednesday 2017-10-11 09:30:05.258
Usually that many condtions create ambiguity because of multiple matches. Hope the above code helps you get the deisred result. When I look into your code the precedence of the conditions may also matter so try to put brackets to the end condition too i.e
((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30))
Upvotes: 1