shockwave
shockwave

Reputation: 3262

SQL consolidate overlapping dates based on criteria

I'm trying to merge overlapping dates between Admit and discharge dates of patients. There are a few edge cases which I couldn't cover in the query.

Input

+----+------------+--------------+
| ID | Admit_Dt   | Discharge_Dt |
+----+------------+--------------+
| 1  | 12/30/2020 | 07/14/2021   |
+----+------------+--------------+
| 1  | 01/02/2021 | 07/14/2021   |
+----+------------+--------------+
| 1  | 06/16/2021 | 07/14/2021   |
+----+------------+--------------+
| 2  | 03/04/2021 | 03/25/2021   |
+----+------------+--------------+
| 2  | 05/01/2021 | 05/10/2021   |
+----+------------+--------------+
| 3  | 06/01/2021 | 06/05/2021   |
+----+------------+--------------+

Expected Output

+----+------------+--------------+
| ID | Admit_dt   | Discharge_dt |
+----+------------+--------------+
| 1  | 12/30/2020 | 07/14/2021   |
+----+------------+--------------+
| 2  | 03/04/2021 | 03/25/2021   |
+----+------------+--------------+
| 2  | 05/01/2021 | 05/10/2021   |
+----+------------+--------------+
| 3  | 06/01/2021 | 06/05/2021   |
+----+------------+--------------+

Query I used the logic that was here But this doesn't cover the edge case for ID 2 and 3. Also the subquery is slower when the data is huge. Is it possible to tackle this problem using LAG?

SELECT dr1.* FROM Member_Discharges dr1
INNER JOIN Member_Discharges dr2
ON dr2.ADMIT_DT> dr1.ADMIT_DT 
  and dr2.ADMIT_DT< dr1.DISCHARGE_DT

Upvotes: 0

Views: 117

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

This is a type of gaps-and-islands problem. I would suggest using a cumulative max to determine when an "island" starts and then aggregate:

select id, min(admit_dt), max(discharge_dt)
from (select t.*,
             sum(case when prev_Discharge_dt >= Admit_Dt then 0 else 1 end) over (partition by id order by admit_dt, discharge_dt) as grp
      from (select t.*,
                   max(Discharge_dt) over (partition by id
                                           order by Admit_Dt, Discharge_dt
                                           rows between unbounded preceding and 1 preceding) as prev_Discharge_dt
            from t
           ) t
     ) t
group by id, grp;

Here is a db<>fiddle.

The innermost subquery is retrieving the maximum discharge date before each row. This allows you to check for an overlap. The middle subquery counts up the number of times there is no overlap -- the beginning of a group. And the outer query aggregates.

Upvotes: 3

Charlieface
Charlieface

Reputation: 71159

Here is another "gaps-and-islands" solution

  • Use LAG to determine if the previous Discharge_Dt is earlier than the current Admit_Dt, if so we have a starting point
  • Number the islands using COUNT OVER
  • Group by the ID and the new grouping number, and take the min and max dates
WITH StartPoints AS (
    SELECT *,
      IsStart = CASE WHEN LAG(Discharge_Dt, 1, '19000101')
                  OVER (PARTITION BY ID ORDER BY Admit_Dt)
                < Admit_Dt THEN 1 END
    FROM YourTable t
),
Groupings AS (
    SELECT *,
      GroupId = COUNT(IsStart) OVER (PARTITION BY ID
                    ORDER BY Admit_Dt ROWS UNBOUNDED PRECEDING)
    FROM StartPoints
)
SELECT ID, Admit_Dt = MIN(Admit_Dt), Discharge_Dt = MAX(Discharge_Dt)
FROM Groupings
GROUP BY ID, GroupId
ORDER BY ID, GroupId;

db<>fiddle

Upvotes: 0

Related Questions