Dimitar
Dimitar

Reputation: 1179

How to create a MYSQL View of task time per day given start date, finish date and pauses in between

I have a system which calculates the work time of a user on a specific task. The main issue is that the tasks take a variable amount of time to complete. Some take a few hours and some take a few days or even a week.

Each task has a start and a stop time as well as pauses in between:

+------+---------------------+---------------------+
| _id  | _start_date         | _finish_date        |
+------+---------------------+---------------------+
| 5013 | 2021-06-28 15:23:00 | 2021-06-29 09:25:01 |
| 5020 | 2021-06-28 15:33:00 | 2021-06-28 15:35:57 |
| 5025 | 2021-06-28 15:41:00 | 2021-07-06 07:33:24 |
+------+---------------------+---------------------+

Each task can have one, several or no pauses between the start and finish date:

+------+---------+---------------------+---------------------+
| _id  | _job_id | _pause_start        | _pause_end          |
+------+---------+---------------------+---------------------+
| 1690 |    5013 | 2021-06-28 15:26:00 | 2021-06-29 09:24:48 |
| 1700 |    5025 | 2021-06-28 16:31:00 | 2021-06-29 10:52:56 |
| 1723 |    5025 | 2021-06-29 10:54:00 | 2021-07-06 07:32:59 |
+------+---------+---------------------+---------------------+

The main issue occurs when doing analytics. Lets say we have a project who has been worked on for 3 days long (20th to 22nd of July). If I generate a report for the 21st, I only want to get the repair time on the 21st which means that I'd have to calculate the time between the pauses for that day.

So I have come up with a solution - create a view which contains daily repair time of each task:

+---------+------------+--------------+
| _job_id | _date      | _repair_time |
+---------+------------+--------------+
|     123 | 2022-07-20 |           25 |
|     123 | 2022-07-21 |          180 |
|     123 | 2022-07-22 |           45 |
+---------+------------+--------------+

So that when I run the report for the 20th or 21st or 22nd of July, I'd get the repair time for that day.

Is this possible to accomplish given my table structure?

Upvotes: 2

Views: 144

Answers (1)

Mihe
Mihe

Reputation: 2318

I'll give it a try and create VIEWs for each step, to make things more clear. In the end I will give an all-in-one query using CTEs, so that the real answer can be found down at step #5.

Preparation:

CREATE TABLE tasks (
  id bigint NOT NULL AUTO_INCREMENT PRIMARY KEY,
  task_start datetime,
  task_end datetime
);

CREATE TABLE pauses (
  id bigint NOT NULL AUTO_INCREMENT PRIMARY KEY,
  task_id bigint NOT NULL REFERENCES tasks(id),
  pause_start datetime,
  pause_end datetime
);

INSERT INTO tasks(task_start, task_end) VALUES
  ('2022-07-02 07:45:01', NULL),
  ('2022-07-02 13:32:17', '2022-07-10 14:23:28');

INSERT INTO pauses(task_id, pause_start, pause_end) VALUES
  (1, '2022-07-02 19:00:00', '2022-07-03 07:00:00'),
  (2, '2022-07-05 14:00:00', '2022-07-05 14:05:00'),
  (2, '2022-07-08 19:00:00', '2022-07-09 07:00:00');

Step #1: Transform anything to a series of events. Some task or pauses may have started but not yet ended, so fake dates are used:

CREATE OR REPLACE VIEW task_events AS 
SELECT * FROM (  
    SELECT id AS task_id, 'task_start' AS event_type, task_start as event_time FROM tasks
    UNION ALL
    SELECT id, 'task_end', coalesce(task_end, date_add(task_start, INTERVAL 10 YEAR)) FROM tasks
    UNION ALL
    SELECT task_id, 'pause_start', pause_start FROM pauses
    UNION ALL
    SELECT task_id, 'pause_end', coalesce(pause_end, date_add(pause_start, INTERVAL 10 YEAR)) FROM pauses
) e
ORDER BY event_time;

Step #2: Transform the events back to periods:

CREATE OR REPLACE VIEW event_periods AS
SELECT e.*,
  lag(event_type) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_type,
  lag(event_time) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_time
FROM task_events e;

Step #3: Extract only those periods which started some work, that is whenever a task started or a pause ended.

CREATE OR REPLACE VIEW working_periods AS
SELECT * FROM event_periods WHERE event_type IN ('task_start', 'pause_end')
ORDER BY task_id, event_time;

Step #4: Use the working_periods for reporting. E. g. to get the working time between 2022-07-09 and 2022-07-11:

SELECT p.*, timediff(next_event_time, event_time) AS working_time 
  FROM (SELECT task_id,
               event_type, greatest(event_time, '2022-07-09') AS event_time,
               next_event_type, least(next_event_time, '2022-07-11') AS next_event_time
          FROM working_periods
         WHERE next_event_time >= '2022-07-09' and event_time <= '2022-07-11') p;

Step #5: rewrite using CTEs

WITH task_events AS (  
        SELECT id AS task_id, 'task_start' AS event_type, task_start as event_time FROM tasks
        UNION ALL
        SELECT id, 'task_end', coalesce(task_end, date_add(task_start, INTERVAL 10 YEAR)) FROM tasks
        UNION ALL
        SELECT task_id, 'pause_start', pause_start FROM pauses
        UNION ALL
        SELECT task_id, 'pause_end', coalesce(pause_end, date_add(pause_start, INTERVAL 10 YEAR)) FROM pauses
    ),
    event_periods AS (
        SELECT e.*,
          lag(event_type) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_type,
          lag(event_time) OVER (PARTITION BY task_id ORDER BY event_time DESC) AS next_event_time
        FROM task_events e
    ),
    working_periods AS (
        SELECT * 
          FROM event_periods WHERE event_type IN ('task_start', 'pause_end')
         ORDER BY task_id, event_time
    ),
    report_periods AS (
        SELECT task_id,
               event_type, greatest(event_time, '2022-07-09') AS event_time,
               next_event_type, least(next_event_time, '2022-07-11') AS next_event_time
          FROM working_periods
         WHERE next_event_time >= '2022-07-09' and event_time <= '2022-07-11'
    )
SELECT p.*, timediff(next_event_time, event_time) AS working_time
  FROM report_periods p;

Result:

task_id event_type event_time next_event_type next_event_time working_time
1 pause_end 2022-07-09 00:00:00 task_end 2022-07-11 00:00:00 48:00:00
2 pause_end 2022-07-09 07:00:00 task_end 2022-07-10 14:23:28 31:23:28

I'm pretty sure, it's far from being perfect, but perhaps it can serve as a starting point.

Upvotes: 3

Related Questions