Parth Pandya
Parth Pandya

Reputation: 73

Count total work hours of the employee per date in pandas

I have the pandas dataframe like this:

Employee_id  timestamp
   1        2017-06-21 04:47:45
   1        2017-06-21 04:48:45
   1        2017-06-21 04:49:45

for each employee, I am getting ping every 1 minute if he/she is in the office. I have around 2000 employee's ping, I need the output like:

Employee_id    date           Total_work_hour
     1      2018-06-21               8
     1      2018-06-22               7
     2      2018-06-21               6
     2      2018-06-22               8

for all 2000 employee

Upvotes: 1

Views: 461

Answers (1)

jezrael
jezrael

Reputation: 863236

Use groupby with lambda function for diff with sum of all diferences, then convert it to seconds by total_seconds and divide by 3600 for hours:

df1 = (df.groupby(['Employee_id', df['timestamp'].dt.date])['timestamp']
        .apply(lambda x: x.diff().sum())
        .dt.total_seconds()
        .div(3600)
        .reset_index(name='Total_work_hour'))
print (df1)
   Employee_id  timestamp  Total_work_hour
0            1 2017-06-21         0.033333

But if possible some missing consecutive minutes, is possible use custom function:

print (df)
   Employee_id           timestamp
0            1 2017-06-21 04:47:45
1            1 2017-06-21 04:48:45
2            1 2017-06-21 04:49:45
3            1 2017-06-21 04:55:45

def f(x):
    vals = x.diff()
    return vals.mask(vals > pd.Timedelta(60, unit='s')).sum()

df1 = (df.groupby(['Employee_id', df['timestamp'].dt.date])['timestamp']
        .apply(f)
        .dt.total_seconds()
        .div(3600)
        .reset_index(name='Total_work_hour')
        )
print (df1)
   Employee_id  timestamp  Total_work_hour
0            1 2017-06-21         0.033333

Upvotes: 5

Related Questions