vamyip
vamyip

Reputation: 1171

SQL distinct Worked Dates across Projects excluding Break Dates

Consider the following schema;

CREATE TABLE `Project Assignment`
    (`Employee` varchar(1), `Project Id` int, `Project Assignment Date` date, `Project Relieving Date` date)
;

INSERT INTO `Project Assignment`
    (`Employee`, `Project Id`, `Project Assignment Date`, `Project Relieving Date`)
VALUES
    ('A', 1, '2018-04-01', '2019-12-25'),
    ('A', 2, '2019-06-15', '2020-03-31'),
    ('A', 3, '2019-09-07', '2020-05-20'),
    ('A', 4, '2020-07-14', '2020-12-15')
;


CREATE TABLE `Break`
    (`Break Id` int, `Employee` varchar(1), `Project Id` int, `Break Start Date` date, `Break End Date` date)
;

INSERT INTO `Break`
    (`Break Id`, `Employee`, `Project Id`, `Break Start Date`, `Break End Date`)
VALUES
    (1, 'A', 1, '2018-09-01', '2018-09-30'),
    (2, 'A', 1, '2019-10-05', '2019-11-30'),
    (3, 'A', 2, '2019-10-15', '2019-11-15'),
    (4, 'A', 3, '2019-11-01', '2019-11-10'),
    (5, 'A', 2, '2020-01-01', '2020-01-10'),
    (6, 'A', 3, '2020-01-01', '2020-01-10')
;

During a project, an employee can take one or more breaks in each Project. The breaks don't overlap within Project but can overlap across projects.

We want Count of Days on which an Employee had at least one project assigned (minus) the days on which the employee was on break on all assigned projects.

I was able to derive Distinct Days the Employee was assigned to Projects by using below query:

SELECT merged.employee,
    SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
FROM (SELECT
        s1.employee, s1.`Project Assignment Date`,
        MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
    FROM `Project Assignment` s1
    INNER JOIN `Project Assignment` t1
        ON t1.employee = s1.employee
        AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
        AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
            WHERE t2.employee = s1.employee 
                AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date` 
                AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
    WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
        WHERE s2.employee = s1.employee
            AND s1.`Project Assignment Date` > s2.`Project Assignment Date` 
            AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
    GROUP BY s1.employee, s1.`Project Assignment Date`
    ORDER BY s1.`Project Assignment Date`) merged
GROUP BY merged.employee

Result:

| employee | assigned_days |
| -------- | ------------- |
| A        | 936           |

But couldn't think of a way to derive days on which the person was on break on all assigned projects.

Expected Result:

+----------+---------------+------------+-------------+
| employee | assigned_days | break_days | worked_days |
+==========+===============+============+=============+
| A        | 936           | 50         | 886         |
+----------+---------------+------------+-------------+

Mariadb 10.3.29

Explanation of working out break_days

+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| Employee | Project | Break Start | Break End        | Days Considered | Remarks                                                                                                           |
+==========+=========+=============+==================+=================+===================================================================================================================+
| A        | 1       |  2018-09-01 |  2018-09-30      | 30              | Only one project assigned so consider whole break                                                                 |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A        | 1       |  2019-10-05 |  2019-11-30      | 10              | 3 Projects were   assigned during these breaks. The common days of break fall between   2019-11-01 and 2019-11-10 |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 2       |  2019-10-15 |  2019-11-15      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 3       |  2019-11-01 |  2019-11-10      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A        | 2       |  2020-01-01 |  2020-01-10      | 10              | 2 Projects were assigned during this time and break in both projects                                              |
+----------+---------+-------------+------------------+                 |                                                                                                                   |
| A        | 3       |  2020-01-01 |  2020-01-10      |                 |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
|          |         |             | Total Break Days | 50              |                                                                                                                   |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+

Link for DB-Fiddle: https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/0

Upvotes: 0

Views: 114

Answers (2)

vamyip
vamyip

Reputation: 1171

After adding Break Id column to Break table, I could building upon aggregation technique suggested by @forpass to derive break days:

Then, for each date in both cases, with aggregation get all the projects as a comma separated list with GROUP_CONCAT().

For each break get the count and list of overlapping projects(using GROUP_CONCAT). Then join it again over Break to find count and list of overlapping breaks along with the smallest common overlap (lastest start & earliest end). Use ROW_NUMBER to eliminate duplicates.

Move query for Assigned Days into another CTE and joined with CTE for breaks to get the desired result.

WITH breaks_summary AS (
    SELECT `Employee`, SUM(break_days) break_days
    FROM (      
        SELECT b.`Employee`, DATEDIFF(b.end_date, b.start_date)+1 break_days, ROW_NUMBER() OVER (PARTITION BY b.break_ids) rn, overlapping_breaks, break_ids, projects_count
        FROM (
            SELECT b_p_cnt.`Employee`, b_p_cnt.`Project Id`, b_p_cnt.projects_count, 
            COUNT(b2.`Break Id`) overlapping_breaks, GROUP_CONCAT(b2.`Break Id`) break_ids, MAX(b2.start_date) start_date, MIN(b2.end_date) end_date
            FROM (
                SELECT b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date, GROUP_CONCAT(pa.`Project Id`) projects, count(pa.`Project Id`) projects_count
                FROM (
                    SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
                    FROM `Break` 
                    ) b1
                LEFT JOIN `Project Assignment` pa ON b1.`Employee` = pa.`Employee`
                    AND ((b1.start_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE()))
                        OR (b1.end_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE())))
                GROUP BY b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date) b_p_cnt
            LEFT JOIN (
                SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
                FROM `Break`
                ORDER BY `Break Id`) b2 ON b_p_cnt.`Employee` = b2.`Employee` 
                    AND ((b_p_cnt.start_date BETWEEN b2.start_date AND b2.end_date)
                        OR (b_p_cnt.end_date BETWEEN b2.start_date AND b2.end_date))
            GROUP BY b_p_cnt.`Break Id`, b_p_cnt.`Employee`, b_p_cnt.`Project Id`, 
                b_p_cnt.start_date, b_p_cnt.end_date, b_p_cnt.projects, b_p_cnt.projects_count
            HAVING count(b2.`Break Id`) = b_p_cnt.projects_count
            ORDER BY b_p_cnt.`Employee`, `Project Id`) b        
            ) breaks
    WHERE rn = 1
    GROUP BY `Employee`),   
assigned AS (
    SELECT merged.`Employee`, SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
            FROM (SELECT s1.`Employee`, s1.`Project Assignment Date`,
                    MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
                FROM `Project Assignment` s1
                INNER JOIN `Project Assignment` t1 ON t1.`Employee` = s1.`Employee`
                    AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
                    AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
                        WHERE t2.`Employee` = s1.`Employee`
                            AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date` 
                            AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
                WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
                    WHERE s2.`Employee` = s1.`Employee`
                        AND s1.`Project Assignment Date` > s2.`Project Assignment Date` 
                        AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
                GROUP BY s1.`Employee`, s1.`Project Assignment Date`
                ORDER BY s1.`Project Assignment Date`) merged
        GROUP BY merged.`Employee`)
SELECT ad.`Employee`,
    ad.assigned_days,
    IFNULL(bs.break_days,0) break_days,
    (ad.assigned_days - IFNULL(bs.break_days,0)) worked_days
FROM assigned ad
LEFT JOIN breaks_summary bs ON ad.`Employee` = bs.`Employee`

Updated DB-Fiddle with the query: https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/3

Thanks to all those who contributed by improving question as well as providing possible answers.

Upvotes: 0

forpas
forpas

Reputation: 164099

Use recursive CTEs to get all working and all break dates for each employee.
Then, for each date in both cases, with aggregation get all the projects as a comma separated list with GROUP_CONCAT().
If these lists match for a certain date then this is a break date.

WITH RECURSIVE 
  working_dates AS (
    SELECT `Employee`, `Project Id`, `Project Assignment Date` AS date, `Project Relieving Date`
    FROM `Project Assignment`
    UNION ALL
    SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Project Relieving Date`
    FROM working_dates
    WHERE date < `Project Relieving Date`
  ),
  break_dates AS (
    SELECT `Employee`, `Project Id`, `Break Start Date` AS date, `Break End Date`
    FROM `Break`
    UNION ALL
    SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Break End Date`
    FROM break_dates
    WHERE date < `Break End Date`
  ),
  working AS (
    SELECT `Employee`, date,
           GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
    FROM working_dates
    GROUP BY `Employee`, date 
  ),
  breaks AS (
    SELECT `Employee`, date,
           GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
    FROM break_dates
    GROUP BY `Employee`, date
  )
SELECT w.`Employee`,
       COUNT(*) assigned_days, 
       COUNT(b.date) AS break_days,
       COUNT(*) - COUNT(b.date) worked_days
FROM working w LEFT JOIN breaks b
ON w.`Employee` = b.`Employee` AND w.date = b.date AND w.projects = b.projects
GROUP BY w.`Employee`

See the demo.

Upvotes: 1

Related Questions