Reputation: 1728
I want to calculate a Employee's Working hours based on a condition, here is the sample data
df=pd.DataFrame({'ID':[1001,1002,1003,1004,1005,1006],'In Punch':['2019-07-28 08:27:25','30-07-2019 08:10:56','05-08-2019 19:44:12','06-08-2019 08:28:51','25-08-2019 08:03:50','08-08-2019 12:44:12'],'Out Punch':['2019-07-28 08:27:25','30-07-2019 19:48:28','05-08-2019 19:44:12','06-08-2019 19:47:21','25-08-2019 19:40:05','08-08-2019 12:44:12']})
I want Output like this
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 08:00:00
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 04:00:00
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 04:00:00
df['Hours'] has been created with condition that
1. If df['Out Punch'] - df['In Punch'] = 00:00:00, then check df['In Punch'] ,
if df['In Punch'] is before or on 12.00 pm then
df['Hours'] = pd.Timedelta(8, unit='H') (just insert/Update 8.00 hours).
else if
df['In Punch'] is between 12.00 and 14.00 pm ,
then df['Hours'] = pd.Timedelta(4, unit='H') (Insert/Update 4.00 hours).
else
df['Hours'] = pd.Timedelta(4, unit='H') (Insert/Update 4.00 hours).
2. If df['Out Punch'] - df['In Punch'] != 00:00:00,
df['Hours'] = df['Out Punch'] - df['In Punch']
I Tried With this
def create(df):
if df['Out Punch'] - df['In Punch'] == pd.Timedelta(0):
if pd.to_timedelta(df['In Punch']) <= pd.Timedelta(12, unit='H'):
return pd.Timedelta(8, unit='H')
elif pd.to_timedelta(t['In Punch']) > pd.Timedelta(12, unit='H') | pd.to_timedelta(t['In Punch']) <= pd.Timedelta(14, unit='H'):
return pd.Timedelta(4, unit='H')
else:
return pd.Timedelta(4, unit='H')
else:
df['Out Punch'] - df['In Punch']
df['Out Punch'] = pd.to_datetime(df['Out Punch']) ; df['In Punch'] = pd.to_datetime(df['In Punch'])
df['Hours'] = df.apply(create, axis=1)
But it gave error
ValueError: ('Value must be Timedelta, string, integer, float, timedelta or convertible', 'occurred at index 0')
Any Suggestion?
Upvotes: 1
Views: 470
Reputation: 862471
Use numpy.select
:
#convert both columns to datetimes
df[['In Punch', 'Out Punch']] = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)
s = df['Out Punch'] - df['In Punch']
#convert times to timedeltas
td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))
#compare difference s and timedeltas td
m1 = s == pd.Timedelta(0)
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')
#output Series
s2 = td + pd.Timedelta(8, unit='H')
s3 = td + pd.Timedelta(4, unit='H')
s4 = td - pd.Timedelta(4, unit='H')
masks =[(m1 & m2), (m1 & m3), (m1 & m4)]
vals = [s2, s3, s4]
#set output by conditions
df['Hours'] = np.select(masks, vals, default=s)
print (df)
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 16:27:25
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 15:44:12
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 16:44:12
EDIT:
df[['In Punch', 'Out Punch']] = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)
s = df['Out Punch'] - df['In Punch']
td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))
m1 = s == pd.Timedelta(0)
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')
s2 = np.timedelta64(8, 'h')
s3 = np.timedelta64(4, 'h')
masks =[(m1 & m2), (m1 & m3 | m4)]
vals = [s2, s3]
df['Hours'] = np.select(masks, vals, default=s)
print (df)
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 08:00:00
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 04:00:00
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 04:00:00
Upvotes: 1
Reputation: 150
You'll need to convert the dtype of the columns to something that Pandas can recognize for doing datetime arithmetic:
import pandas as pd
df['column_name'] = pd.to_datetime(df['column_name'])
Upvotes: 0