Reputation: 45
I have a table that contains a record for each ward stay within a hospital spell (note: a spell can include transfers to other hospitals). Spellno is the unique identifier of a spell. I would like to aggregate consecutive ward stays within a spell to hospital level. The problem I have is that if a patient goes from hospital1 to hospital2 and back to hospital1 a GROUP BY
'Spellno' and 'Hospital' would combine the two hospital1 stays, which I don't want to do.
e.g. if this was my data:
Spellno Hospital WardCode WardStart WardEnd
-------------------------------------------------------------------
123 hosp1 ward1 01/04/2015 03/04/2015
123 hosp1 ward4 03/04/2015 05/04/2015
123 hosp2 ward2 05/04/2015 07/04/2015
123 hosp1 ward3 07/04/2015 10/04/2015
123 hosp1 ward1 10/04/2015 12/04/2015
I want to aggregate on Spellno and Hospital to get:
Spellno Hospital WardStart WardEnd
-------------------------------------------------------------------
123 hosp1 01/04/2015 05/04/2015
123 hosp2 05/04/2015 07/04/2015
123 hosp1 07/04/2015 12/04/2015
Many thanks in advance.
Upvotes: 4
Views: 844
Reputation: 220902
I'm assuming that the (WardStart, WardEnd)
date ranges are strictly consecutive with no overlapping. For simplicity's sake, I'm also assuming that consecutive ranges don't exceed the max recursion default.
This can be solved using recursive SQL:
WITH
data AS (
SELECT *
FROM (
VALUES (123, 'hosp1', 'ward1', CAST('2015-04-01' AS DATE), CAST('2015-04-03' AS DATE)),
(123, 'hosp1', 'ward4', CAST('2015-04-03' AS DATE), CAST('2015-04-05' AS DATE)),
(123, 'hosp2', 'ward2', CAST('2015-04-05' AS DATE), CAST('2015-04-07' AS DATE)),
(123, 'hosp1', 'ward3', CAST('2015-04-07' AS DATE), CAST('2015-04-10' AS DATE)),
(123, 'hosp1', 'ward1', CAST('2015-04-10' AS DATE), CAST('2015-04-12' AS DATE))
) AS t(Spellno, Hospital, WardCode, WardStart, WardEnd)
),
consecutive(Spellno, Hospital, WardStart, WardEnd) AS (
SELECT Spellno, Hospital, WardStart, WardEnd
FROM data AS d1
WHERE NOT EXISTS (
SELECT *
FROM data AS d2
WHERE d1.Spellno = d2.Spellno
AND d1.Hospital = d2.Hospital
AND d1.WardStart = d2.WardEnd
)
UNION ALL
SELECT c.Spellno, c.Hospital, c.WardStart, d.WardEnd
FROM consecutive AS c
JOIN data AS d
ON c.Spellno = d.Spellno
AND c.Hospital = d.Hospital
AND c.WardEnd = d.WardStart
)
SELECT Spellno, Hospital, WardStart, MAX(WardEnd)
FROM consecutive
GROUP BY Spellno, Hospital, WardStart
ORDER BY Spellno, WardStart
The first subquery in the recursive CTE consecutive
initialises the recursion to start with all rows for which there isn't any "previous row" for the same (Spellno, Hospital)
. This produces:
Spellno Hospital WardStart WardEnd
-----------------------------------------
123 hosp1 2015-04-01 2015-04-03
123 hosp2 2015-04-05 2015-04-07
123 hosp1 2015-04-07 2015-04-10
The recursion then produces a new row with the previous row's WardStart
(which is always the same for consecutive rows) and the current WardEnd
. This produces:
Spellno Hospital WardStart WardEnd
-----------------------------------------
123 hosp1 2015-04-01 2015-04-03 <-- Unwanted, "intermediary" row
123 hosp1 2015-04-01 2015-04-05
123 hosp2 2015-04-05 2015-04-07
123 hosp1 2015-04-07 2015-04-10 <-- Unwanted, "intermediary" row
123 hosp1 2015-04-07 2015-04-12
Finally, in the outer query, we select only the maximum value of WardEnd
for each consecutive series, producing the wanted result:
Spellno Hospital WardStart WardEnd
-----------------------------------------
123 hosp1 2015-04-01 2015-04-05
123 hosp2 2015-04-05 2015-04-07
123 hosp1 2015-04-07 2015-04-12
Upvotes: 1
Reputation: 175726
You can use subquery in WHERE
clause to filter out overlapping dates ranges and second subquery in SELECT
to get range end.
SELECT Spellno, Hospital,D.WardStart,
(SELECT Min(E.WardEnd)
FROM #tab E
WHERE E.WardEnd >= D.WardEnd
AND E.Spellno = D.Spellno
AND E.Hospital = D.Hospital
AND NOT EXISTS (SELECT 1
FROM #tab E2
WHERE E.WardStart < E2.WardStart
AND E.WardEnd >= E2.WardStart
AND D.Spellno = E2.Spellno
AND D.Hospital = E2.Hospital)
) AS WardEnd
FROM #tab D
WHERE NOT EXISTS (SELECT 1
FROM #tab D2
WHERE D.WardStart <= D2.WardEnd
AND D.WardEnd > D2.WardEnd
AND D.Spellno = D2.Spellno
AND D.Hospital = D2.Hospital)
Warning:
This query performance may not be the best but it would do the work.
Output:
╔═════════╦══════════╦═════════════════════╦═════════════════════╗
║ Spellno ║ Hospital ║ WardStart ║ WardEnd ║
╠═════════╬══════════╬═════════════════════╬═════════════════════╣
║ 123 ║ hosp1 ║ 2015-04-01 00:00:00 ║ 2015-04-05 00:00:00 ║
║ 123 ║ hosp2 ║ 2015-04-05 00:00:00 ║ 2015-04-07 00:00:00 ║
║ 123 ║ hosp1 ║ 2015-04-07 00:00:00 ║ 2015-04-12 00:00:00 ║
╚═════════╩══════════╩═════════════════════╩═════════════════════╝
Upvotes: 4