6seven8
6seven8

Reputation: 97

How to calculate daily and weekly over-time hours in a data-frame?

This is a tough question. I have a data-frame that looks something like this:

    Employee    Date    Hours1  Reg      OT 
0    emp1   08/12/2018  10.00   8.00    2.00    
1    emp1   08/13/2018  10.00   8.00    2.00
2    emp1   08/14/2018  10.00   8.00    2.00    
3    emp1   08/15/2018  10.00   8.00    2.00    
4    emp1   08/17/2018  10.00   8.00    2.00    
5    emp2   08/12/2018  7.00    7.00    0.00    
6    emp2   08/13/2018  9.00    8.00    1.00
7    emp2   08/15/2018  10.00   8.00    2.00
8    emp2   08/17/2018  10.00   8.00    2.00
9    emp2   08/18/2018  8.00    8.00    0.00

What I am trying to do is calculate over-time for employees in California. California law says anything over 8 hours in a single day and/or anything over 40 hours in a single week is considered over-time.

Hours1 = total hours worked that day.

Reg = np.where((df['Hours1']) < 8, df['Hours1'], (df['OT'])) (Regular time hours)

OT = np.where((df['Hours1']) > 8, (df['Hours1']) - 8, (df['OT'])) (OT hours)

The issue occurs when an employee works over 40 hours in a week and has worked over 8 hours in a single day. I am having a hard time combining the two formulas. As you can see, emp1 worked 40 hours in the first 4 days, so all hours in row=4 should be considered overtime.

For each date, I would like to add up all the hours under Hours1, for each employee, and when the amount >=40 then all subsequent hours are considered OT and, for that employee, no hours should be considered regular any longer. It feels like there should be a loop for each row, for each employee, that contains this expression but everything I have tried has not worked.

Ouput should like like this:

    Employee    Date    Hours1  Reg      OT 
0    emp1   08/12/2018  10.00   8.00    2.00    
1    emp1   08/13/2018  10.00   8.00    2.00
2    emp1   08/14/2018  10.00   8.00    2.00    
3    emp1   08/15/2018  10.00   8.00    2.00    
4    emp1   08/17/2018  10.00   0.00    10.00    
5    emp2   08/12/2018  7.00    7.00    0.00    
6    emp2   08/13/2018  9.00    8.00    1.00
7    emp2   08/15/2018  10.00   8.00    2.00
8    emp2   08/17/2018  10.00   8.00    2.00
9    emp2   08/18/2018  8.00    4.00    4.00

Any and all help is appreciated.

For future readers the key to the equation was cumsum. The final code looked like this (Note: I dropped reg hours):

df['Cum hours'] = df.groupby('Employee')['Hours1'].transform('cumsum')

df['Week OT'] = np.where(df['Cum hours'] - df['Hours1'] > 40, 
                      df['Hours1'], 
                      df['Cum hours'] - 40)

df['OT'] = np.where(df['Cum hours'] > 40, 
             df['Week OT'], 
             df['OT'])

df = df.loc[:, ['Employee', 'Date', 'Hours1', 'OT']]

Upvotes: 1

Views: 566

Answers (1)

Andrey Portnoy
Andrey Portnoy

Reputation: 1509

Assuming the dataframe only contains data from a single week, do:

cumulative_hours = df.groupby('Employee')['Hours1'].cumsum()

ot_within_day = df.OT
ot_within_week = np.where(cumulative_hours - df.Hours1 > 40, 
                          df.Hours1, 
                          cumulative_hours - 40)

ot_CA = np.where(cumulative_hours > 40, 
                 ot_within_week, 
                 ot_within_day)

Upvotes: 3

Related Questions