Reputation: 33
I need to calculate split labor hours for some reports and have been struggling with it. Our Oracle database currently does this using a stored procedure in a package, but I am working within the confines of Snowflake/dbt for this project and while I could just bring the data from a view in Oracle over to Snowflake, I'm wanting to do some transformations on the time prior to recalculating the split time, as we frequently have people forgetting to clock out and I want to correct these to the end of their shift. So I have to recalculate split time after adjusting the erroneous clock outs.
My data looks like this:
emp_id | task_id | time_in | time_out |
---|---|---|---|
43 | 159221 | 2021-09-30 09:00:00 | 2021-09-30 14:30:00 |
43 | 188239 | 2021-09-30 09:16:03 | 2021-09-30 10:44:51 |
43 | 177741 | 2021-09-30 13:12:49 | 2021-09-30 14:14:32 |
Stopping here and summing up the hours gets:
emp_id | task_id | hrs |
---|---|---|
43 | 159221 | 5.5 |
43 | 188239 | 1.47 |
43 | 177741 | 1.03 |
This is wrong because some of the time spent was shared with other tasks and should be split.
What I need to do is if we were calculating the total hours for the first task, this would be the formula:
.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 hours worked on task_id 159221.
Is there any way to handle this in SQL or will I have to learn JavaScript and write a stored procedure?
Additional details:
Upvotes: 3
Views: 170
Reputation: 10035
You may try the following.
A LEFT JOIN was used to determine overlapping duration before aggregating the results to subtract the total overlapping duration from the actual task duration.
SELECT
emp_id,
task_id,
MAX(task_duration) - SUM(overlap_duration/2) as hours
FROM (
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
)
) t3
GROUP BY
emp_id, task_id;
emp_id | task_id | hours |
---|---|---|
43 | 159221 | 4.245694444444444 |
43 | 177741 | 1.0286111111111111 |
43 | 188239 | 1.48 |
View working demo on DB Fiddle
I've also included the output of the subquery below for your perusal
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
);
emp_id | task_id | time_in | time_out | task_duration | overlap_duration | overlap_task |
---|---|---|---|---|---|---|
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.0286111111111111 | 177741 |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:44:51.000Z | 1.48 | 0 | |
43 | 177741 | 2021-09-30T13:12:49.000Z | 2021-09-30T14:14:32.000Z | 1.0286111111111111 | 0 |
Responding to question update
Additional details:
- The # of tasks that can overlap is variable, it's not only up to 2.
- Employees may clock into the same task multiple times throughout the day
I've added more sample data and provided an updated fiddle. I'm unable to provide a demo fiddle with snowflake but we can change the EXTRACT
to DATEDIFF
as shown earlier.
CREATE TABLE task_times (
emp_id INTEGER,
task_id INTEGER,
time_in TIMESTAMP,
time_out TIMESTAMP
);
INSERT INTO task_times
(emp_id, task_id, time_in, time_out)
VALUES
-- add repetition of the same tasks with overlaps
('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
--- add overlap of different tasks
('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
-- add multiple overalapping tasks with a difference of 10 minutes
('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
-- add multiple log in and log out times with no overlaps
('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
;
Subquery to determine overlaps (for debugging purposes)
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
);
emp_id | task_id | time_in | time_out | task_duration | overlap_duration | overlap_task |
---|---|---|---|---|---|---|
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.0286111111111111 | 177741 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.0286111111111111 | 177741 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.0286111111111111 | 177741 |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:44:51.000Z | 1.48 | 0 | |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:43:51.000Z | 1.4633333333333334 | 0 | |
43 | 177741 | 2021-09-30T13:12:49.000Z | 2021-09-30T14:14:32.000Z | 1.0286111111111111 | 0 | |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.5 | 333334 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.3333333333333335 | 333335 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.1666666666666665 | 333336 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3 | 333337 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 0.5 | 333337 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3.3333333333333335 | 333335 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3.1666666666666665 | 333336 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3 | 333337 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 0.5 | 333337 |
43 | 333335 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:20:00.000Z | 3.3333333333333335 | 3.1666666666666665 | 333336 |
43 | 333335 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:20:00.000Z | 3.3333333333333335 | 3 | 333337 |
43 | 333336 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:10:00.000Z | 3.1666666666666665 | 3 | 333337 |
43 | 333337 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:00:00.000Z | 3 | 0 | |
43 | 333337 | 2021-10-30T12:30:00.000Z | 2021-10-30T13:00:00.000Z | 0.5 | 0 | |
43 | 333337 | 2021-10-30T13:30:00.000Z | 2021-10-30T14:00:00.000Z | 0.5 | 0 |
Actual Query (I've only including the mins
column for debugging purposes)
SELECT
emp_id,
task_id,
MAX(CASE
WHEN overlap_duration>0 THEN task_duration
ELSE 0
END ) + SUM(CASE
WHEN overlap_duration=0 THEN task_duration
ELSE 0
END) - AVG(overlap_duration) as hrs,
(MAX(CASE
WHEN overlap_duration>0 THEN task_duration
ELSE 0
END ) + SUM(CASE
WHEN overlap_duration=0 THEN task_duration
ELSE 0
END) - AVG(overlap_duration))*60 as mins
FROM (
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
)
) t3
GROUP BY
emp_id, task_id
ORDER BY
emp_id, task_id;
emp_id | task_id | hrs | mins |
---|---|---|---|
43 | 159221 | 4.259351851851852 | 255.5611111111111 |
43 | 177741 | 1.0286111111111111 | 61.71666666666667 |
43 | 188239 | 2.9433333333333334 | 176.6 |
43 | 333333 | 0.9666666666666663 | 57.99999999999998 |
43 | 333334 | 1 | 60 |
43 | 333335 | 0.25000000000000044 | 15.000000000000027 |
43 | 333336 | 0.16666666666666652 | 9.999999999999991 |
43 | 333337 | 4 | 240 |
Let me know if this works for you.
Upvotes: 1