Reputation: 43
Eliminate Overlapping Date Ranges based on Employee and their Department with SQL Query
The Start Date is always the first day of the month and the End Date is always the last day of the month. An Employee can belong to multiple Departments. These employees are all in ONE table, but I split them up to easily display what the expected return is. Also, an employee can have a "dead" period which you can see applied in my third expected return.
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
3904 | 215 | 2022-01-01 | 2022-01-31 |
3904 | 215 | 2022-02-01 | 2022-09-30 |
3904 | 215 | 2022-06-01 | 2022-06-30 |
3904 | 215 | 2022-06-01 | 2022-12-31 |
3904 | 215 | 2022-09-01 | 2022-09-30 |
3904 | 215 | 2022-09-01 | 2022-09-30 |
Expected Return:
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
3904 | 215 | 2022-01-01 | 2022-12-31 |
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
4923 | 629 | 2022-01-01 | 2022-08-31 |
4923 | 629 | 2022-01-01 | 2022-09-30 |
4923 | 629 | 2022-01-01 | 2022-12-31 |
4923 | 629 | 2022-08-01 | 2022-10-31 |
4923 | 629 | 2022-09-01 | 2022-09-30 |
4923 | 629 | 2022-10-01 | 2022-10-31 |
4923 | 629 | 2022-11-01 | 2022-12-31 |
Expected Return:
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
4923 | 629 | 2022-01-01 | 2022-12-31 |
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
8925 | 629 | 2022-01-01 | 2022-04-30 |
8925 | 629 | 2022-02-01 | 2022-03-31 |
8925 | 629 | 2022-08-01 | 2022-10-31 |
8925 | 629 | 2022-11-01 | 2022-12-31 |
Expected Return:
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
8925 | 629 | 2022-01-01 | 2022-04-30 |
8925 | 629 | 2022-08-01 | 2022-12-31 |
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
5877 | 629 | 2022-01-01 | 2022-08-31 |
5877 | 629 | 2022-01-01 | 2022-09-30 |
5877 | 629 | 2022-01-01 | 2022-12-31 |
5877 | 629 | 2022-08-01 | 2022-10-31 |
5877 | 215 | 2022-09-01 | 2022-09-30 |
5877 | 215 | 2022-10-01 | 2022-10-31 |
5877 | 215 | 2022-11-01 | 2022-12-31 |
Expected Return:
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
5877 | 629 | 2022-01-01 | 2022-12-31 |
5877 | 215 | 2022-09-01 | 2022-12-31 |
The below query returns all possible overlapping dates for an EMPLOYEE_ID/DEPARTMENT_ID. I received the expected results.
SELECT A.EMPLOYEE_ID, A.DEPARTMENT_ID, A.START_DATE, A.END_DATE
FROM EMPLOYEES A, EMPLOYEES B
WHERE
A.EMPLOYEE_ID = B.EMPLOYEE_ID AND
A.DEPARTMENT_ID = B.DEPARTMENT_ID AND
(
(A.START_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
(A.END_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
(A.START_DATE < B.START_DATE AND A.END_DATE > B.END_DATE) OR
(A.START_DATE > B.START_DATE AND A.END_DATE < B.END_DATE)
)
ORDER BY A.EMPLOYEE_ID, A.DEPARTMENT_ID, A.START_DATE, A.END_DATE
The following query tries to eliminate the overlapping dates except it doesn't work as intended with my third expected return from above examples.
SELECT A.EMPLOYEE_ID, A.DEPARTMENT_ID, MIN(A.START_DATE), MAX(A.END_DATE)
FROM EMPLOYEES A, EMPLOYEES B
WHERE
A.EMPLOYEE_ID = B.EMPLOYEE_ID AND
A.DEPARTMENT_ID = B.DEPARTMENT_ID AND
(
(A.START_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
(A.END_DATE BETWEEN B.START_DATE AND B.END_DATE) OR
(A.START_DATE < B.START_DATE AND A.END_DATE > B.END_DATE) OR
(A.START_DATE > B.START_DATE AND A.END_DATE < B.END_DATE)
)
GROUP BY A.EMPLOYEE_ID, A.DEPARTMENT_ID
ORDER BY A.EMPLOYEE_ID, A.DEPARTMENT_ID, MIN(A.START_DATE), MAX(A.END_DATE)
EMPLOYEE_ID | DEPARTMENT_ID | START_DATE | END_DATE |
---|---|---|---|
8925 | 629 | 2022-01-01 | 2022-12-31 |
Upvotes: 2
Views: 173
Reputation: 71471
Using a recursive cte
:
with recursive cte(e_id, d_id, s, e) as (
select e.employee_id, e.department_id, min(e.start_date), max(e.end_date) from employees e group by e.employee_id, e.department_id
union all
select c.e_id, c.d_id, c.s + interval '1 day', c.e from cte c where c.s < c.e
),
runs as (
select distinct c.e_id, c.d_id, c.s, e.start_date is not null o from cte c
left join employees e on e.employee_id = c.e_id and e.department_id = c.d_id and e.start_date <= c.s and c.s <= e.end_date
)
select t2.e_id, t2.d_id, t2.s_date, t2.e_date from (
select t1.k, t1.e_id, t1.d_id, min(t1.s) s_date, max(t1.s) e_date from (
select (select count(*) from runs r1 where r1.e_id = r.e_id and r.d_id = r1.d_id and r1.s < r.s and r.o != r1.o) k, r.*
from runs r) t1
where t1.o
group by t1.k, t1.e_id, t1.d_id) t2
Upvotes: 1
Reputation: 24
Try this script located in the following git hub directory
The below does integers. https://github.com/smpetersgithub/AdvancedSQLPuzzles/blob/main/Advanced%20SQL%20Puzzles/Part%20II/21%20Merge%20Overlapping%20Records.sql
This code does dates... Hope my formatting works, new to StackOverflow.
DROP TABLE IF EXISTS #TimePeriods;
DROP TABLE IF EXISTS #Distinct_StartDates;
DROP TABLE IF EXISTS #OuterJoin;
DROP TABLE IF EXISTS #DetermineValidEndDates;
DROP TABLE IF EXISTS #DetermineValidEndDates2;
GO
CREATE TABLE #TimePeriods
(
StartDate DATE,
EndDate DATE,
PRIMARY KEY (StartDate, EndDate)
);
GO
INSERT INTO #TimePeriods (StartDate, EndDate) VALUES
('1/1/2018','1/5/2018'),
('1/3/2018','1/9/2018'),
('1/10/2018','1/11/2018'),
('1/12/2018','1/16/2018'),
('1/15/2018','1/19/2018');
GO
--Step 1
SELECT DISTINCT
StartDate
INTO #Distinct_StartDates
FROM #TimePeriods;
GO
--Step 2
SELECT a.StartDate AS StartDate_A,
a.EndDate AS EndDate_A,
b.StartDate AS StartDate_B,
b.EndDate AS EndDate_B
INTO #OuterJoin
FROM #TimePeriods AS a LEFT OUTER JOIN
#TimePeriods AS b ON a.EndDate >= b.StartDate AND
a.EndDate < b.EndDate;
GO
--Step 3
SELECT EndDate_A
INTO #DetermineValidEndDates
FROM #OuterJoin
WHERE StartDate_B IS NULL
GROUP BY EndDate_A;
GO
--Step 4
SELECT a.StartDate, MIN(b.EndDate_A) AS MinEndDate_A
INTO #DetermineValidEndDates2
FROM #Distinct_StartDates a INNER JOIN
#DetermineValidEndDates b ON a.StartDate <= b.EndDate_A
GROUP BY a.StartDate;
GO
--Results
SELECT MIN(StartDate) AS StartDate,
MAX(MinEndDate_A) AS EndDate
FROM #DetermineValidEndDates2
GROUP BY MinEndDate_A;
GO
Upvotes: -1