MarkB
MarkB

Reputation: 45

SQL Server: only group consecutive records when using GROUP BY

I have a table that contains a record for each ward stay within a hospital spell (note: a spell can include transfers to other hospitals). Spellno is the unique identifier of a spell. I would like to aggregate consecutive ward stays within a spell to hospital level. The problem I have is that if a patient goes from hospital1 to hospital2 and back to hospital1 a GROUP BY 'Spellno' and 'Hospital' would combine the two hospital1 stays, which I don't want to do.

e.g. if this was my data:

Spellno   Hospital   WardCode   WardStart   WardEnd 
-------------------------------------------------------------------
123       hosp1      ward1      01/04/2015  03/04/2015
123       hosp1      ward4      03/04/2015  05/04/2015
123       hosp2      ward2      05/04/2015  07/04/2015
123       hosp1      ward3      07/04/2015  10/04/2015
123       hosp1      ward1      10/04/2015  12/04/2015

I want to aggregate on Spellno and Hospital to get:

Spellno   Hospital   WardStart   WardEnd 
-------------------------------------------------------------------
123       hosp1      01/04/2015  05/04/2015
123       hosp2      05/04/2015  07/04/2015
123       hosp1      07/04/2015  12/04/2015

Many thanks in advance.

Upvotes: 4

Views: 844

Answers (2)

Lukas Eder
Lukas Eder

Reputation: 220902

I'm assuming that the (WardStart, WardEnd) date ranges are strictly consecutive with no overlapping. For simplicity's sake, I'm also assuming that consecutive ranges don't exceed the max recursion default.

This can be solved using recursive SQL:

WITH 
  data AS (
    SELECT * 
    FROM (
      VALUES (123, 'hosp1', 'ward1', CAST('2015-04-01' AS DATE), CAST('2015-04-03' AS DATE)),
             (123, 'hosp1', 'ward4', CAST('2015-04-03' AS DATE), CAST('2015-04-05' AS DATE)),
             (123, 'hosp2', 'ward2', CAST('2015-04-05' AS DATE), CAST('2015-04-07' AS DATE)),
             (123, 'hosp1', 'ward3', CAST('2015-04-07' AS DATE), CAST('2015-04-10' AS DATE)),
             (123, 'hosp1', 'ward1', CAST('2015-04-10' AS DATE), CAST('2015-04-12' AS DATE))
    ) AS t(Spellno, Hospital, WardCode, WardStart, WardEnd)
  ),
  consecutive(Spellno, Hospital, WardStart, WardEnd) AS (
    SELECT Spellno, Hospital, WardStart, WardEnd
    FROM data AS d1
    WHERE NOT EXISTS (
      SELECT *
      FROM data AS d2
      WHERE d1.Spellno = d2.Spellno
      AND d1.Hospital = d2.Hospital
      AND d1.WardStart = d2.WardEnd
    )
    UNION ALL
    SELECT c.Spellno, c.Hospital, c.WardStart, d.WardEnd
    FROM consecutive AS c
    JOIN data AS d
    ON c.Spellno = d.Spellno
    AND c.Hospital = d.Hospital
    AND c.WardEnd = d.WardStart
  )
SELECT Spellno, Hospital, WardStart, MAX(WardEnd)
FROM consecutive
GROUP BY Spellno, Hospital, WardStart
ORDER BY Spellno, WardStart

Demo

Explanation

The first subquery in the recursive CTE consecutive initialises the recursion to start with all rows for which there isn't any "previous row" for the same (Spellno, Hospital). This produces:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-03
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-10

The recursion then produces a new row with the previous row's WardStart (which is always the same for consecutive rows) and the current WardEnd. This produces:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-03 <-- Unwanted, "intermediary" row
123      hosp1     2015-04-01  2015-04-05
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-10 <-- Unwanted, "intermediary" row
123      hosp1     2015-04-07  2015-04-12

Finally, in the outer query, we select only the maximum value of WardEnd for each consecutive series, producing the wanted result:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-05
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-12

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175726

You can use subquery in WHERE clause to filter out overlapping dates ranges and second subquery in SELECT to get range end.

SELECT Spellno, Hospital,D.WardStart,
   (SELECT Min(E.WardEnd)
    FROM #tab E
    WHERE E.WardEnd >= D.WardEnd
      AND E.Spellno = D.Spellno
      AND E.Hospital = D.Hospital
      AND NOT EXISTS (SELECT 1
                      FROM #tab E2
                      WHERE E.WardStart < E2.WardStart
                        AND E.WardEnd >= E2.WardStart
                        AND D.Spellno = E2.Spellno
                        AND D.Hospital = E2.Hospital)
  ) AS WardEnd
FROM #tab D
WHERE NOT EXISTS (SELECT 1
                  FROM #tab D2
                  WHERE D.WardStart <= D2.WardEnd
                    AND D.WardEnd > D2.WardEnd
                    AND D.Spellno = D2.Spellno
                    AND D.Hospital = D2.Hospital)

Warning:

This query performance may not be the best but it would do the work.

LiveDemo

Output:

╔═════════╦══════════╦═════════════════════╦═════════════════════╗
║ Spellno ║ Hospital ║      WardStart      ║       WardEnd       ║
╠═════════╬══════════╬═════════════════════╬═════════════════════╣
║     123 ║ hosp1    ║ 2015-04-01 00:00:00 ║ 2015-04-05 00:00:00 ║
║     123 ║ hosp2    ║ 2015-04-05 00:00:00 ║ 2015-04-07 00:00:00 ║
║     123 ║ hosp1    ║ 2015-04-07 00:00:00 ║ 2015-04-12 00:00:00 ║
╚═════════╩══════════╩═════════════════════╩═════════════════════╝

Upvotes: 4

Related Questions