Reputation: 1179
I have a system which calculates the work time of a user on a specific task. The main issue is that the tasks take a variable amount of time to complete. Some take a few hours and some take a few days or even a week.
Each task has a start and a stop time as well as pauses in between:
+------+---------------------+---------------------+
| _id | _start_date | _finish_date |
+------+---------------------+---------------------+
| 5013 | 2021-06-28 15:23:00 | 2021-06-29 09:25:01 |
| 5020 | 2021-06-28 15:33:00 | 2021-06-28 15:35:57 |
| 5025 | 2021-06-28 15:41:00 | 2021-07-06 07:33:24 |
+------+---------------------+---------------------+
Each task can have one, several or no pauses between the start and finish date:
+------+---------+---------------------+---------------------+
| _id | _job_id | _pause_start | _pause_end |
+------+---------+---------------------+---------------------+
| 1690 | 5013 | 2021-06-28 15:26:00 | 2021-06-29 09:24:48 |
| 1700 | 5025 | 2021-06-28 16:31:00 | 2021-06-29 10:52:56 |
| 1723 | 5025 | 2021-06-29 10:54:00 | 2021-07-06 07:32:59 |
+------+---------+---------------------+---------------------+
The main issue occurs when doing analytics. Lets say we have a project who has been worked on for 3 days long (20th to 22nd of July). If I generate a report for the 21st, I only want to get the repair time on the 21st which means that I'd have to calculate the time between the pauses for that day.
So I have come up with a solution - create a view which contains daily repair time of each task:
+---------+------------+--------------+
| _job_id | _date | _repair_time |
+---------+------------+--------------+
| 123 | 2022-07-20 | 25 |
| 123 | 2022-07-21 | 180 |
| 123 | 2022-07-22 | 45 |
+---------+------------+--------------+
So that when I run the report for the 20th or 21st or 22nd of July, I'd get the repair time for that day.
Is this possible to accomplish given my table structure?
Upvotes: 2
Views: 144
Reputation: 2318
I'll give it a try and create VIEWs for each step, to make things more clear. In the end I will give an all-in-one query using CTEs, so that the real answer can be found down at step #5.
Preparation:
CREATE TABLE tasks (
id bigint NOT NULL AUTO_INCREMENT PRIMARY KEY,
task_start datetime,
task_end datetime
);
CREATE TABLE pauses (
id bigint NOT NULL AUTO_INCREMENT PRIMARY KEY,
task_id bigint NOT NULL REFERENCES tasks(id),
pause_start datetime,
pause_end datetime
);
INSERT INTO tasks(task_start, task_end) VALUES
('2022-07-02 07:45:01', NULL),
('2022-07-02 13:32:17', '2022-07-10 14:23:28');
INSERT INTO pauses(task_id, pause_start, pause_end) VALUES
(1, '2022-07-02 19:00:00', '2022-07-03 07:00:00'),
(2, '2022-07-05 14:00:00', '2022-07-05 14:05:00'),
(2, '2022-07-08 19:00:00', '2022-07-09 07:00:00');
Step #1: Transform anything to a series of events. Some task or pauses may have started but not yet ended, so fake dates are used:
CREATE OR REPLACE VIEW task_events AS
SELECT * FROM (
SELECT id AS task_id, 'task_start' AS event_type, task_start as event_time FROM tasks
UNION ALL
SELECT id, 'task_end', coalesce(task_end, date_add(task_start, INTERVAL 10 YEAR)) FROM tasks
UNION ALL
SELECT task_id, 'pause_start', pause_start FROM pauses
UNION ALL
SELECT task_id, 'pause_end', coalesce(pause_end, date_add(pause_start, INTERVAL 10 YEAR)) FROM pauses
) e
ORDER BY event_time;
Step #2: Transform the events back to periods:
CREATE OR REPLACE VIEW event_periods AS
SELECT e.*,
lag(event_type) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_type,
lag(event_time) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_time
FROM task_events e;
Step #3: Extract only those periods which started some work, that is whenever a task started or a pause ended.
CREATE OR REPLACE VIEW working_periods AS
SELECT * FROM event_periods WHERE event_type IN ('task_start', 'pause_end')
ORDER BY task_id, event_time;
Step #4: Use the working_periods for reporting. E. g. to get the working time between 2022-07-09 and 2022-07-11:
SELECT p.*, timediff(next_event_time, event_time) AS working_time
FROM (SELECT task_id,
event_type, greatest(event_time, '2022-07-09') AS event_time,
next_event_type, least(next_event_time, '2022-07-11') AS next_event_time
FROM working_periods
WHERE next_event_time >= '2022-07-09' and event_time <= '2022-07-11') p;
Step #5: rewrite using CTEs
WITH task_events AS (
SELECT id AS task_id, 'task_start' AS event_type, task_start as event_time FROM tasks
UNION ALL
SELECT id, 'task_end', coalesce(task_end, date_add(task_start, INTERVAL 10 YEAR)) FROM tasks
UNION ALL
SELECT task_id, 'pause_start', pause_start FROM pauses
UNION ALL
SELECT task_id, 'pause_end', coalesce(pause_end, date_add(pause_start, INTERVAL 10 YEAR)) FROM pauses
),
event_periods AS (
SELECT e.*,
lag(event_type) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_type,
lag(event_time) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_time
FROM task_events e
),
working_periods AS (
SELECT *
FROM event_periods WHERE event_type IN ('task_start', 'pause_end')
ORDER BY task_id, event_time
),
report_periods AS (
SELECT task_id,
event_type, greatest(event_time, '2022-07-09') AS event_time,
next_event_type, least(next_event_time, '2022-07-11') AS next_event_time
FROM working_periods
WHERE next_event_time >= '2022-07-09' and event_time <= '2022-07-11'
)
SELECT p.*, timediff(next_event_time, event_time) AS working_time
FROM report_periods p;
Result:
task_id | event_type | event_time | next_event_type | next_event_time | working_time |
---|---|---|---|---|---|
1 | pause_end | 2022-07-09 00:00:00 | task_end | 2022-07-11 00:00:00 | 48:00:00 |
2 | pause_end | 2022-07-09 07:00:00 | task_end | 2022-07-10 14:23:28 | 31:23:28 |
I'm pretty sure, it's far from being perfect, but perhaps it can serve as a starting point.
Upvotes: 3