Ryan Hernke
Ryan Hernke

Reputation: 33

Calculating split labor hours?

I need to calculate split labor hours for some reports and have been struggling with it. Our Oracle database currently does this using a stored procedure in a package, but I am working within the confines of Snowflake/dbt for this project and while I could just bring the data from a view in Oracle over to Snowflake, I'm wanting to do some transformations on the time prior to recalculating the split time, as we frequently have people forgetting to clock out and I want to correct these to the end of their shift. So I have to recalculate split time after adjusting the erroneous clock outs.

My data looks like this:

emp_id task_id time_in time_out
43 159221 2021-09-30 09:00:00 2021-09-30 14:30:00
43 188239 2021-09-30 09:16:03 2021-09-30 10:44:51
43 177741 2021-09-30 13:12:49 2021-09-30 14:14:32

Stopping here and summing up the hours gets:

emp_id task_id hrs
43 159221 5.5
43 188239 1.47
43 177741 1.03

This is wrong because some of the time spent was shared with other tasks and should be split.

What I need to do is if we were calculating the total hours for the first task, this would be the formula:

.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 hours worked on task_id 159221.

Is there any way to handle this in SQL or will I have to learn JavaScript and write a stored procedure?

Additional details:

Upvotes: 3

Views: 170

Answers (1)

ggordon
ggordon

Reputation: 10035

You may try the following.

A LEFT JOIN was used to determine overlapping duration before aggregating the results to subtract the total overlapping duration from the actual task duration.

SELECT
     emp_id,
     task_id,
     MAX(task_duration) - SUM(overlap_duration/2) as hours
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id;
emp_id task_id hours
43 159221 4.245694444444444
43 177741 1.0286111111111111
43 188239 1.48

View working demo on DB Fiddle

I've also included the output of the subquery below for your perusal

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in time_out task_duration overlap_duration overlap_task
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0

View demo on DB Fiddle

Edit 1

Responding to question update

Additional details:

  1. The # of tasks that can overlap is variable, it's not only up to 2.
  2. Employees may clock into the same task multiple times throughout the day

I've added more sample data and provided an updated fiddle. I'm unable to provide a demo fiddle with snowflake but we can change the EXTRACT to DATEDIFF as shown earlier.

CREATE TABLE task_times (
  emp_id INTEGER,
  task_id INTEGER,
  time_in TIMESTAMP,
  time_out TIMESTAMP
);

INSERT INTO task_times
  (emp_id, task_id, time_in, time_out)
VALUES
-- add repetition of the same tasks with overlaps
  ('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
--- add overlap of different tasks
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
  ('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
-- add multiple overalapping tasks with a difference of 10 minutes
  ('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
  ('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
  ('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
  ('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
-- add multiple log in and log out times with no overlaps
  ('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
  ('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
  ('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
  ;

Subquery to determine overlaps (for debugging purposes)

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in time_out task_duration overlap_duration overlap_task
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.4633333333333334 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:43:51.000Z 1.4633333333333334 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.5 333334
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.3333333333333335 333335
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.1666666666666665 333336
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3 333337
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 0.5 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.3333333333333335 333335
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.1666666666666665 333336
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 0.5 333337
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3.1666666666666665 333336
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3 333337
43 333336 2021-10-30T09:00:00.000Z 2021-10-30T12:10:00.000Z 3.1666666666666665 3 333337
43 333337 2021-10-30T09:00:00.000Z 2021-10-30T12:00:00.000Z 3 0
43 333337 2021-10-30T12:30:00.000Z 2021-10-30T13:00:00.000Z 0.5 0
43 333337 2021-10-30T13:30:00.000Z 2021-10-30T14:00:00.000Z 0.5 0

Actual Query (I've only including the mins column for debugging purposes)

SELECT
     emp_id,
     task_id,
     MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration) as hrs,
     (MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration))*60 as mins
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id
ORDER BY
    emp_id, task_id;
emp_id task_id hrs mins
43 159221 4.259351851851852 255.5611111111111
43 177741 1.0286111111111111 61.71666666666667
43 188239 2.9433333333333334 176.6
43 333333 0.9666666666666663 57.99999999999998
43 333334 1 60
43 333335 0.25000000000000044 15.000000000000027
43 333336 0.16666666666666652 9.999999999999991
43 333337 4 240

View on DB Fiddle

Let me know if this works for you.

Upvotes: 1

Related Questions