Reputation: 3
I have table in a SQL Server database table Department
and it has 4 columns:
staffNumber | department | startDate | endDate |
---|---|---|---|
100 | A | 2016-09-22 18:30:00.000 | 2020-02-06 18:29:00.000 |
100 | B | 2020-02-06 18:30:00.000 | 2022-08-21 18:29:00.000 |
100 | A | 2022-08-21 18:30:00.000 | 2079-12-31 18:29:00.000 |
101 | A | 2018-06-03 18:30:00.000 | 2023-08-20 18:29:00.000 |
101 | B | 2023-08-20 18:30:00.000 | 2079-12-31 18:29:00.000 |
102 | A | 2022-12-21 18:30:00.000 | 2023-12-29 18:29:00.000 |
102 | A | 2023-12-29 18:30:00.000 | 2079-12-31 18:29:00.000 |
103 | A | 2016-06-22 18:30:00.000 | 2018-03-05 18:29:00.000 |
103 | A | 2018-03-05 18:30:00.000 | 2021-03-05 18:29:00.000 |
103 | A | 2021-03-05 18:30:00.000 | 2079-12-31 18:29:00.000 |
104 | A | 2016-06-22 18:30:00.000 | 2021-03-05 18:29:00.000 |
104 | A | 2021-03-05 18:30:00.000 | 2079-12-31 18:29:00.000 |
105 | B | 2016-08-22 18:30:00.000 | 2021-02-07 18:29:00.000 |
105 | B | 2021-02-09 18:30:00.000 | 2079-12-31 18:29:00.000 |
I want to merge more then rows based on specific conditions:
staffNumber
department
expiryDate
of the current row and the effectiveDate
of the next rowThis is my SQL query I tried, but I am not getting the expected output:
WITH DeptData_CTE AS
(
SELECT
staffNumber,
department,
startDate,
endDate
LEAD(startDate) OVER (PARTITION BY staffNumber, department ORDER BY startDate) AS nextstartDate,
LAG(endDate) OVER (PARTITION BY staffNumber, department ORDER BY startDate) AS prevendDate
FROM
department
),
MergedData_CTE AS
(
SELECT
staffNumber,
department,
MIN(startDate) AS startDate,
MAX(endDate) AS endDate
FROM
DeptData_CTE
-- We merge rows where previous expiry date and current effective date are consecutive
WHERE
prevendDate IS NULL
OR DATEDIFF(day, prevendDate, startDate) != 0
GROUP BY
staffNumber,
department
)
-- Final selection of rows
SELECT
staffNumber,
department,
startDate,
endDate
FROM
MergedData_CTE
ORDER BY
staffNumber,
department,
startDate;
I am getting expected startDate
with merged rows with two consecutive rows, but not getting expected endDate
.
My expected output would be:
staffNumber | department | startDate | endDate |
---|---|---|---|
100 | A | 2016-09-22 18:30:00.000 | 2020-02-06 18:29:00.000 |
100 | B | 2020-02-06 18:30:00.000 | 2022-08-21 18:29:00.000 |
100 | A | 2022-08-21 18:30:00.000 | 2079-12-31 18:29:00.000 |
101 | A | 2018-06-03 18:30:00.000 | 2023-08-20 18:29:00.000 |
101 | B | 2023-08-20 18:30:00.000 | 2079-12-31 18:29:00.000 |
102 | A | 2022-12-21 18:30:00.000 | 2079-12-31 18:29:00.000 |
103 | A | 2016-06-22 18:30:00.000 | 2079-12-31 18:29:00.000 |
104 | A | 2016-06-22 18:30:00.000 | 2079-12-31 18:29:00.000 |
105 | B | 2016-08-22 18:30:00.000 | 2021-02-07 18:29:00.000 |
105 | B | 2021-02-09 18:30:00.000 | 2079-12-31 18:29:00.000 |
For staff numer 105, the row is not merged because date difference is >= 1 day.
I do not want to use a stored procedure. I just want to use SELECT
SQL query
Upvotes: 0
Views: 62
Reputation: 5926
This is gaps and islands
task.
Assign number for gaps. This is sum(isGap)over(partition by staffNumber order by startDate) gn
.
Then merge islands by group by
.
See example
with DeptData as(
select *
,case when department<>lag(department,1,department)
over(partition by staffNumber order by startDate) then 1
when datediff(day,lag(endDate,1,startDate)over(partition by staffNumber order by startDate)
,startDate)>0 then 1
else 0
end isGap
from department
)
,gaps as(
select *
,sum(isGap)over(partition by staffNumber order by startDate) gn
from DeptData
)
select staffNumber,department
,min(startDate)startDate,max(endDate)endDate
from gaps
group by staffNumber,department,gn
order by staffNumber,startDate,department
staffNumber | department | startDate | endDate |
---|---|---|---|
100 | A | 2016-09-22 18:30:00.000 | 2020-02-06 18:29:00.000 |
100 | B | 2020-02-06 18:30:00.000 | 2022-08-21 18:29:00.000 |
100 | A | 2022-08-21 18:30:00.000 | 2079-12-31 18:29:00.000 |
101 | A | 2018-06-03 18:30:00.000 | 2023-08-20 18:29:00.000 |
101 | B | 2023-08-20 18:30:00.000 | 2079-12-31 18:29:00.000 |
102 | A | 2022-12-21 18:30:00.000 | 2079-12-31 18:29:00.000 |
103 | A | 2016-06-22 18:30:00.000 | 2079-12-31 18:29:00.000 |
104 | A | 2016-06-22 18:30:00.000 | 2079-12-31 18:29:00.000 |
105 | B | 2016-08-22 18:30:00.000 | 2021-02-07 18:29:00.000 |
105 | B | 2021-02-09 18:30:00.000 | 2079-12-31 18:29:00.000 |
Before group by
we get this result.
There you see gaps (isGap) and islands (gn) particularily for stuffNumber.
staffNumber | department | startDate | endDate | isGap | gn |
---|---|---|---|---|---|
100 | A | 2016-09-22 18:30:00.000 | 2020-02-06 18:29:00.000 | 0 | 0 |
100 | B | 2020-02-06 18:30:00.000 | 2022-08-21 18:29:00.000 | 1 | 1 |
100 | A | 2022-08-21 18:30:00.000 | 2079-12-31 18:29:00.000 | 1 | 2 |
101 | A | 2018-06-03 18:30:00.000 | 2023-08-20 18:29:00.000 | 0 | 0 |
101 | B | 2023-08-20 18:30:00.000 | 2079-12-31 18:29:00.000 | 1 | 1 |
102 | A | 2022-12-21 18:30:00.000 | 2023-12-29 18:29:00.000 | 0 | 0 |
102 | A | 2023-12-29 18:30:00.000 | 2079-12-31 18:29:00.000 | 0 | 0 |
103 | A | 2016-06-22 18:30:00.000 | 2018-03-05 18:29:00.000 | 0 | 0 |
103 | A | 2018-03-05 18:30:00.000 | 2021-03-05 18:29:00.000 | 0 | 0 |
103 | A | 2021-03-05 18:30:00.000 | 2079-12-31 18:29:00.000 | 0 | 0 |
104 | A | 2016-06-22 18:30:00.000 | 2021-03-05 18:29:00.000 | 0 | 0 |
104 | A | 2021-03-05 18:30:00.000 | 2079-12-31 18:29:00.000 | 0 | 0 |
105 | B | 2016-08-22 18:30:00.000 | 2021-02-07 18:29:00.000 | 0 | 0 |
105 | B | 2021-02-09 18:30:00.000 | 2079-12-31 18:29:00.000 | 1 | 1 |
We can use partition by stuffNumber,department. Then gap is when days between previous endDate and current startDate >0. See example 2 in fiddle
Upvotes: 1