MrStrangerDanger
MrStrangerDanger

Reputation: 43

Eliminate Overlapping Date Ranges based on Employee and their Department with SQL Query

Eliminate Overlapping Date Ranges based on Employee and their Department with SQL Query

The Start Date is always the first day of the month and the End Date is always the last day of the month. An Employee can belong to multiple Departments. These employees are all in ONE table, but I split them up to easily display what the expected return is. Also, an employee can have a "dead" period which you can see applied in my third expected return.

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
3904 215 2022-01-01 2022-01-31
3904 215 2022-02-01 2022-09-30
3904 215 2022-06-01 2022-06-30
3904 215 2022-06-01 2022-12-31
3904 215 2022-09-01 2022-09-30
3904 215 2022-09-01 2022-09-30

Expected Return:

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
3904 215 2022-01-01 2022-12-31

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
4923 629 2022-01-01 2022-08-31
4923 629 2022-01-01 2022-09-30
4923 629 2022-01-01 2022-12-31
4923 629 2022-08-01 2022-10-31
4923 629 2022-09-01 2022-09-30
4923 629 2022-10-01 2022-10-31
4923 629 2022-11-01 2022-12-31

Expected Return:

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
4923 629 2022-01-01 2022-12-31

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
8925 629 2022-01-01 2022-04-30
8925 629 2022-02-01 2022-03-31
8925 629 2022-08-01 2022-10-31
8925 629 2022-11-01 2022-12-31

Expected Return:

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
8925 629 2022-01-01 2022-04-30
8925 629 2022-08-01 2022-12-31

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
5877 629 2022-01-01 2022-08-31
5877 629 2022-01-01 2022-09-30
5877 629 2022-01-01 2022-12-31
5877 629 2022-08-01 2022-10-31
5877 215 2022-09-01 2022-09-30
5877 215 2022-10-01 2022-10-31
5877 215 2022-11-01 2022-12-31

Expected Return:

EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
5877 629 2022-01-01 2022-12-31
5877 215 2022-09-01 2022-12-31

The below query returns all possible overlapping dates for an EMPLOYEE_ID/DEPARTMENT_ID. I received the expected results.

SELECT A.EMPLOYEE_ID, A.DEPARTMENT_ID, A.START_DATE, A.END_DATE
FROM EMPLOYEES A, EMPLOYEES B
    WHERE
        A.EMPLOYEE_ID = B.EMPLOYEE_ID AND
        A.DEPARTMENT_ID = B.DEPARTMENT_ID AND
        (
            (A.START_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
            (A.END_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
            (A.START_DATE < B.START_DATE AND A.END_DATE > B.END_DATE) OR
            (A.START_DATE > B.START_DATE AND A.END_DATE < B.END_DATE)
        ) 
        ORDER BY A.EMPLOYEE_ID, A.DEPARTMENT_ID, A.START_DATE, A.END_DATE

The following query tries to eliminate the overlapping dates except it doesn't work as intended with my third expected return from above examples.

SELECT A.EMPLOYEE_ID, A.DEPARTMENT_ID, MIN(A.START_DATE), MAX(A.END_DATE)
FROM EMPLOYEES A, EMPLOYEES B
    WHERE
        A.EMPLOYEE_ID = B.EMPLOYEE_ID AND
        A.DEPARTMENT_ID = B.DEPARTMENT_ID AND
        (
            (A.START_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
            (A.END_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
            (A.START_DATE < B.START_DATE AND A.END_DATE > B.END_DATE) OR
            (A.START_DATE > B.START_DATE AND A.END_DATE < B.END_DATE)
        ) 
        GROUP BY A.EMPLOYEE_ID, A.DEPARTMENT_ID
        ORDER BY A.EMPLOYEE_ID, A.DEPARTMENT_ID, MIN(A.START_DATE), MAX(A.END_DATE)
EMPLOYEE_ID DEPARTMENT_ID START_DATE END_DATE
8925 629 2022-01-01 2022-12-31

Upvotes: 2

Views: 173

Answers (2)

Ajax1234
Ajax1234

Reputation: 71471

Using a recursive cte:

with recursive cte(e_id, d_id, s, e) as (
    select e.employee_id, e.department_id, min(e.start_date), max(e.end_date) from employees e group by e.employee_id, e.department_id
    union all
    select c.e_id, c.d_id, c.s + interval '1 day', c.e from cte c where c.s < c.e
),
runs as (
   select distinct c.e_id, c.d_id, c.s, e.start_date is not null o from cte c 
   left join employees e on e.employee_id = c.e_id and e.department_id = c.d_id and e.start_date <= c.s and c.s <= e.end_date
)
select t2.e_id, t2.d_id, t2.s_date, t2.e_date from (
   select t1.k, t1.e_id, t1.d_id, min(t1.s) s_date, max(t1.s) e_date from (
       select (select count(*) from runs r1 where r1.e_id = r.e_id and r.d_id = r1.d_id and r1.s < r.s and r.o != r1.o) k, r.* 
   from runs r) t1 
where t1.o
group by t1.k, t1.e_id, t1.d_id) t2

See fiddle.

Upvotes: 1

sretep
sretep

Reputation: 24

Try this script located in the following git hub directory

The below does integers. https://github.com/smpetersgithub/AdvancedSQLPuzzles/blob/main/Advanced%20SQL%20Puzzles/Part%20II/21%20Merge%20Overlapping%20Records.sql

This code does dates... Hope my formatting works, new to StackOverflow.

DROP TABLE IF EXISTS #TimePeriods;
DROP TABLE IF EXISTS #Distinct_StartDates;
DROP TABLE IF EXISTS #OuterJoin;
DROP TABLE IF EXISTS #DetermineValidEndDates;
DROP TABLE IF EXISTS #DetermineValidEndDates2;
GO

CREATE TABLE #TimePeriods
(
StartDate  DATE,
EndDate    DATE,
PRIMARY KEY (StartDate, EndDate)
);
GO

INSERT INTO #TimePeriods (StartDate, EndDate) VALUES
('1/1/2018','1/5/2018'),
('1/3/2018','1/9/2018'),
('1/10/2018','1/11/2018'),
('1/12/2018','1/16/2018'),
('1/15/2018','1/19/2018');
GO

--Step 1
SELECT  DISTINCT
        StartDate
INTO    #Distinct_StartDates
FROM    #TimePeriods;
GO

--Step 2
SELECT  a.StartDate AS StartDate_A,
        a.EndDate AS EndDate_A,
        b.StartDate AS StartDate_B,
        b.EndDate AS EndDate_B
INTO    #OuterJoin
FROM    #TimePeriods AS a LEFT OUTER JOIN
        #TimePeriods AS b ON a.EndDate >= b.StartDate AND
                                a.EndDate < b.EndDate;
GO

--Step 3
SELECT  EndDate_A
INTO    #DetermineValidEndDates
FROM    #OuterJoin
WHERE   StartDate_B IS NULL
GROUP BY EndDate_A;
GO

--Step 4
SELECT  a.StartDate, MIN(b.EndDate_A) AS MinEndDate_A
INTO    #DetermineValidEndDates2
FROM    #Distinct_StartDates a INNER JOIN
        #DetermineValidEndDates b ON a.StartDate <= b.EndDate_A
GROUP BY a.StartDate;
GO

--Results
SELECT  MIN(StartDate) AS StartDate,
        MAX(MinEndDate_A) AS EndDate
FROM    #DetermineValidEndDates2
GROUP BY MinEndDate_A;
GO

Upvotes: -1

Related Questions