Reputation: 1730
I'm calculating attendance of employees, here is the sample table
df = pd.DataFrame({
'E_ID': [1001, 1001, 1001, 1002, 1002, 1002, 1002],
'Date': [
'28-07-2019 08:27:00',
'28-07-2019 18:10:00',
'29-07-2019 08:10:00',
'28-07-2019 08:07:00',
'29-07-2019 08:10:10',
'29-07-2019 08:10:17',
'29-07-2019 17:50:00'
]
})
I'm trying to calculate In-Time , Out-Time per day from the Date
column of df
.
Also there could be single entry per day which can be treated as Punch In time.
In-Time would be first punch in AND Out-Time would be last punch out.
And I'm expecting output something like this ,or may be something similar
E_ID OfficePunch In Punch Out Punch
1001 28-08-2019 28-07-2019 08:27 28-07-2019 18:10
1001 29-08-2019 29-07-2019 08:10 29-07-2019 08:10
1002 28-08-2019 28-07-2019 08:07 28-07-2019 08:07
1002 29-08-2019 29-07-2019 08:10 29-07-2019 17:50
Is there any way by which it can be done?
Upvotes: 3
Views: 82
Reputation: 862661
For python object dates in output OfficePunch
column use Series.dt.date
with aggregate by GroupBy.agg
first and last value:
df['Date'] = pd.to_datetime(df['Date'])
df1 = (df.groupby(['E_ID', df['Date'].dt.date.rename('OfficePunch')])['Date']
.agg([('In Punch','first'),('Out Punch','last')])
.reset_index())
print (df1)
E_ID OfficePunch In Punch Out Punch
0 1001 2019-07-28 2019-07-28 08:27:00 2019-07-28 18:10:00
1 1001 2019-07-29 2019-07-29 08:10:00 2019-07-29 08:10:00
2 1002 2019-07-28 2019-07-28 08:07:00 2019-07-28 08:07:00
3 1002 2019-07-29 2019-07-29 08:10:10 2019-07-29 17:50:00
print (df1.dtypes)
E_ID int64
OfficePunch object
In Punch datetime64[ns]
Out Punch datetime64[ns]
dtype: object
If need datetimes in OfficePunch
column use Series.dt.floor
:
df['Date'] = pd.to_datetime(df['Date'])
df1 = (df.groupby(['E_ID', df['Date'].dt.floor('d').rename('OfficePunch')])['Date']
.agg([('In Punch','first'),('Out Punch','last')])
.reset_index())
print (df1)
E_ID OfficePunch In Punch Out Punch
0 1001 2019-07-28 2019-07-28 08:27:00 2019-07-28 18:10:00
1 1001 2019-07-29 2019-07-29 08:10:00 2019-07-29 08:10:00
2 1002 2019-07-28 2019-07-28 08:07:00 2019-07-28 08:07:00
3 1002 2019-07-29 2019-07-29 08:10:10 2019-07-29 17:50:00
print (df1.dtypes)
E_ID int64
OfficePunch datetime64[ns]
In Punch datetime64[ns]
Out Punch datetime64[ns]
dtype: object
Upvotes: 7