CandleWax
CandleWax

Reputation: 2219

How to properly apply recursive CTE?

I have a table that contains hospital visits for patients. I am trying to flag visits in which a visits' begin_date overlaps the previous visits' end_date + 90 days. However, the caveat to this is that once a visit is flagged as an overlap visit, that visit should not be used to assess an overlap with another visit. Let me explain with an example.

Table

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
2           23           1/30/2018   2/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018

In the example above, the patient had 5 visits. Visit 2's begin_date was in range of visit 1's end_date + 90 days, so visit 2 should be flagged. Once visit 2 is flagged, that row should not be used in the analysis for any future visits. Conceptually, it would be like removing visit 2 and beginning the analysis again.

interim stage (visit 2 is removed, and analysis begins again)

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018

So even though visit 3 overlaps with visit 2, since visit 2 has been removed, visit 3 will not be flagged as the previous visit (now visit 1) is more than end_date + 90 days away from visit 3's begin_date. Then, visit 4 should be flagged as it overlaps with a visit that was not flagged (visit 3). So since visit 4 is flagged, then visit 5 will be removed as it's begin_date is in the range of visit 3's end_date + 90 days.

Anticipated output

visitID     patientid    begin_date  end_date    flag
1           23           1/12/2018   1/14/2018   0
2           23           1/30/2018   2/14/2018   1
3           23           4/20/2018   4/22/2018   0
4           23           5/02/2018   5/03/2018   1
5           23           7/23/2018   7/28/2018   1

@gordonlinoff answered a very similar question here, but I am running into issues using recursive CTEs. The difference between the questions is that this question needs to reference another column (end_date), rather than a single date column. Recursive CTEs are still a new concept to me, but I hope this will help solidify the concept.

My attempt to solve this puzzle (piggy backing off of @gordonlinoff):

with vt as (
          select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
          from visits_table vt
         ),
         cte as (
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date, vt.begin_date as first_begin_date, seqnum
          from vt
          where seqnum = 1
          union all
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date,
                 (case when vt.begin_date > dateadd(day, 90, cte.end_date) then vt.begin_date else cte.end_date end),
                 vt.seqnum
          from cte join
               vt
               on vt.seqnum = cte.seqnum + 1 and vt.patientid = cte.patientid
         )
    select cte.visit, cte.patientid, cte.begin_date, cte.end_date,
           (case when first_begin_date = begin_date then 0 else 1 end) as flag
    from cte
    order by cte.patientid, cte.begin_date;

My edits are improperly referencing the end_date based on the results. However, I cannot find where the comparison between begin_date and end_date should be.

Dataset:

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (1,23,'1/12/2018','1/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (2,23,'1/30/2018','2/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (3,23,'4/20/2018','4/22/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (4,23,'5/02/2018','5/03/2018')

Upvotes: 6

Views: 127

Answers (1)

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

I adjusted your sample data to make visit 5 in the range of visit 3's end_date + 90 days. Visit 3 end date is 2018-04-22. If we add 90 days to it, it will be 2018-07-21. Your sample data in the question has visit 5 start date as 2018-07-23, which doesn't overlap with 2018-07-21. So, I adjusted this to be 2018-07-20 to make these dates overlap.

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES 
(1,23,'2018-01-12','2018-01-14'),
(2,23,'2018-01-30','2018-02-14'),
(3,23,'2018-04-20','2018-04-22'),
(4,23,'2018-05-02','2018-05-03'),
(5,23,'2018-07-20','2018-07-28');

You query was pretty close, you just had to calculate both start and end dates of the "previous" interval (first_begin_date, first_end_date).

If the "current" interval overlaps with the "previous", you carry over the "previous" interval into the current row.

Uncomment the lines in the query below to see all intermediate values.

with
vt
as
(
    select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
    from visits_table vt
)
,cte
as
(
    select
        vt.visit
        ,vt.patientid
        ,vt.begin_date as first_begin_date
        ,vt.end_date as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,seqnum
    from vt
    where seqnum = 1

    union all

    select
        vt.visit
        ,vt.patientid
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_begin_date -- they overlap, keep the previous interval
            else vt.begin_date
        end as first_begin_date
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_end_date -- they overlap, keep the previous interval
            else vt.end_date
        end as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,vt.seqnum
    from
        cte
        inner join vt
            on  vt.seqnum = cte.seqnum + 1
            and vt.patientid = cte.patientid
)
select
    cte.visit
    ,cte.patientid
    ,cte.begin_date
    ,cte.end_date
    ,case when first_begin_date = begin_date 
        then 0 else 1
    end as flag
--  ,DATEADD(day, 90, cte.end_date) AS enddd
--  ,*
from cte
order by cte.patientid, cte.begin_date;

Result

+-------+-----------+------------+------------+------+
| visit | patientid | begin_date |  end_date  | flag |
+-------+-----------+------------+------------+------+
|     1 |        23 | 2018-01-12 | 2018-01-14 |    0 |
|     2 |        23 | 2018-01-30 | 2018-02-14 |    1 |
|     3 |        23 | 2018-04-20 | 2018-04-22 |    0 |
|     4 |        23 | 2018-05-02 | 2018-05-03 |    1 |
|     5 |        23 | 2018-07-20 | 2018-07-28 |    1 |
+-------+-----------+------------+------------+------+

Upvotes: 1

Related Questions