Gonzalo
Gonzalo

Reputation: 1114

Conditions in a new column df with Numpy and Timedelta

I am creating a new column ['inc_cr_date_adjusted'] of a df based on 2 columns of the df, the inc_cr_date_day and inc_cr_date but the the below code it's not working as expected. Doesn't give error but it's not working as defined by the conditions as only the conditions for Saturday and Sunday work. The conditions that have included hour & time do not work as the days are not summing up (Timedelta('1 days'). Other issue is that most of the dates should assume the last line of code "df['inc_cr_date'])" and do not change at all, but thats not appening also.

My conditions are based on the Day of the week and hour & minute visible in column inc_cr_date. hour&minute is 9:30 and 18:30 which is divided using the &.

Code is:

df['inc_cr_date_day'] = df['inc_cr_date'].dt.weekday_name

df['inc_cr_date_adjusted'] = np.select([(df['inc_cr_date_day'] == 'Saturday'),#condition working
                              (df['inc_cr_date_day'] == 'Sunday'),#condition working
                              ((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30)],


                           [(df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),                            
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')],

                           df['inc_cr_date'])

Output (that is wrong):

inc_cr_date,inc_cr_date_day,inc_cr_date_adjusted
2017-10-26 21:59:28.075,Thursday,2017-10-26 09:30:00.000 #nok, adjusted should be 2017-10-27 and not 2017-10-26.
2017-10-21 16:49:58.722,Saturday,2017-10-23 09:30:00.000 #ok
2017-10-11 09:30:05.258,Wednesday,2017-10-11 09:30:00.000 #nok, in such situation the adjusted date should be same as inc_cr_date

thanks a lot for your inputs.

Upvotes: 0

Views: 80

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

As a programmer we should try to minimize as many repetitions as possible ( Following DRY principle) . We can make use of .isin to get your desired result i.e

#All the condtions can be reduced to one mask and result 
days_one = ['Monday','Tuesday','Wednesday','Thursday']
days_two = days_one + ['Friday']

# Returns a boolean mask 
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].dt.hour < 9 ) & (df['inc_cr_date'].dt.minute < 30)

# Repeated result can be stored in one variable 
r1 = (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
r2 = (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')


df['inc_cr_date_adjusted'] = np.select([
                          m1, m2,      
                          (df['inc_cr_date_day'] == 'Saturday'),
                          (df['inc_cr_date_day'] == 'Sunday'),
                          ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                          ],
                          [r1, r2,
                          (df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                          (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),                           
                          (df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
                          ],
                          df['inc_cr_date'])

Output :

               inc_cr_date inc_cr_date_day    inc_cr_date_adjusted
0 2017-10-26 21:59:28.075        Thursday 2017-10-27 09:30:00.000
1 2017-10-21 16:49:58.722        Saturday 2017-10-23 09:30:00.000
2 2017-10-11 09:30:05.258       Wednesday 2017-10-11 09:30:05.258

Usually that many condtions create ambiguity because of multiple matches. Hope the above code helps you get the deisred result. When I look into your code the precedence of the conditions may also matter so try to put brackets to the end condition too i.e

((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)) 

Upvotes: 1

Related Questions