rambles
rambles

Reputation: 706

Processing dates into 'episodes' following a step-wise algorithm

I have a table of patients of whom some have one or more 'events' (basically dates). I need to process these dates into episodes using the following steps:-

  1. Following an event, project forwards 14 days and use as a potential end-of-episode date.
    a. Events within this period are ignored, assumed related to the initial events.

  2. If there are no events in the 14 days after the potential end-of-episode date, halt the algorithm - the episode has been fully identified.

  3. If there is an event in the 14 days after the potential end-of-episode date, move potential end-of-episode date to new event date + 14 days.

  4. Repeat from step 2 until there is 14 days from the end-of-episode date clear of events.;

Below is some test data. The 'real' data is patid (patient identifier), eventdate and rn (a unique row number, reset to 1 for each patient, applied in date order). Exp_end is my own entry to indicate the end date of episode. It's on the row that would be the last event for the episode:-

"patid","rn", "eventdate",   "exp_end"
     1,   1,  01-01-2010,  14-01-2010
     1,   2,  11-02-2010,  24-02-2010
     2,   1,  01-01-2010,  
     2,   2,  07-01-2010,  
     2,   3,  12-01-2010,  14-01-2010
     2,   4,  21-02-2010,  06-03-2010
     3,   1,  01-01-2010,  
     3,   2,  12-01-2010,  
     3,   3,  24-01-2010,  06-02-2010
     3,   4,  21-02-2010,  06-03-2010
     4,   1,  01-01-2010,  
     4,   2,  24-01-2010,  
     4,   3,  28-01-2010,  
     4,   4,  04-02-2010,  17-02-2010
     4,   5,  21-02-2010,  06-03-2010
     5,   1,  26-02-2004,  10-03-2004
     5,   2,  03-09-2004,  
     5,   3,  09-09-2004,
     5,   4,  21-09-2004,  04-10-2004
     5,   5,  05-10-2004,
     5,   6,  14-10-2004,
     5,   7,  28-10-2004,  10-11-2004

The SQL code produces the same table:-

CREATE TABLE dp.ep( eventdate DATE FORMAT 'YY/MM/DD', exp_end DATE FORMAT 'YY/MM/DD', patid INTEGER, rn INTEGER) PRIMARY INDEX pats ( patid );
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 2.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-04','YYYY-MM-DD'), TO_DATE('2010-02-17','YYYY-MM-DD'), 4.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-12','YYYY-MM-DD'), TO_DATE('2010-01-14','YYYY-MM-DD'), 2.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 3.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-24','YYYY-MM-DD'), TO_DATE('2010-02-06','YYYY-MM-DD'), 3.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-24','YYYY-MM-DD'), NULL, 4.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 3.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-11','YYYY-MM-DD'), TO_DATE('2010-02-24','YYYY-MM-DD'), 1.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 4.0, 5.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-07','YYYY-MM-DD'), NULL, 2.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 4.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-12','YYYY-MM-DD'), NULL, 3.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-28','YYYY-MM-DD'), NULL, 4.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 2.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), TO_DATE('2010-01-14','YYYY-MM-DD'), 1.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-02-26','YYYY-MM-DD'), TO_DATE('2004-03-10','YYYY-MM-DD'), 5.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-03','YYYY-MM-DD'), NULL, 5.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-09','YYYY-MM-DD'), NULL, 5.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-21','YYYY-MM-DD'), TO_DATE('2004-10-04','YYYY-MM-DD'), 5.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-05','YYYY-MM-DD'), NULL, 5.0, 5.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-14','YYYY-MM-DD'), NULL, 5.0, 6.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-28','YYYY-MM-DD'), TO_DATE('2004-11-10','YYYY-MM-DD'), 5.0, 7.0);

Summarily, these are the results I'm hoping for:-

patid    epi_start       epi_end       epi_num

  1      01JAN2010      14JAN2010            1
         11FEB2010      24FEB2010            2


  2      01JAN2010      14JAN2010            1
         21FEB2010      06MAR2010            2


  3      01JAN2010      06FEB2010            1
         21FEB2010      06MAR2010            2


  4      01JAN2010      17FEB2010            1
         21FEB2010      06MAR2010            2

  5      26FEB2004      10MAR2004            1
         03SEP2004      04OCT2004            2
         05OCT2004      10NOV2004            3

( Of course, the help I need is getting to the table above - I can process further to get this type of output.)

I can get the results easily enough in SAS but with SQL, I'm tearing my hair out! I imagine it needs recursive CTE SQL, something which I'm not great at...

I'm using Teradata SQL, by the way.

Any help will be great.

Upvotes: 0

Views: 261

Answers (1)

dnoeth
dnoeth

Reputation: 60462

If I understood correctly your rules can be rephrased as following:

  1. All rows within 4 weeks after the initial event (rn=1) belong to the first episode.

  2. After 4 weeks a new episode only starts when the gap between the current and the previous row is greater than 2 weeks.

Now it's no longer an iterative algorithm. Of course this can be done using a Recursive CTE, but I prefer Windowed Aggregates :-)

This seems to return the correct result:

SELECT patid, 
   MIN(eventdate) AS epi_start, 
   CASE
      WHEN MAX(eventdate) > MIN(eventdate) + 13 -- more than two weeks since start
      THEN MAX(eventdate) + 13 -- adjust end to two weeks after latest date 
      ELSE MIN(eventdate) + 13 -- two weeks after start
   END AS epi_end, 
   epi_num
FROM
 (
   SELECT dt.*,
      SUM(new_epi) -- calculate the episode number 
      OVER (PARTITION BY patid 
            ORDER BY rn 
            ROWS UNBOUNDED PRECEDING) + 1 AS epi_num
   FROM
    (
      SELECT ep.*,
                   -- less than 4 weeks after the initial date
         CASE WHEN eventdate < MIN(eventdate) OVER (PARTITION BY patid) + 28
                   -- less than 2 weeks after the previous date
                OR eventdate < MAX(eventdate) -- previous date (equivalent to LAG(eventdate)
                                OVER (PARTITION BY patid 
                                      ORDER BY rn
                                      ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 14 
              THEN 0 -- same episode
              ELSE 1 -- new episode starts
         END AS new_epi       
      FROM ep 
    ) AS dt
 ) AS dt
GROUP BY patid, epi_num
ORDER BY 1,4

If rn is created by a ROW_NUMBER you might get rid of that calculation by using ORDER BY eventdate instead of rn (if it's unique per patid)

Edit:

I didn't get your logic correctly, it can't be simplified, so it can't be done with OLAP, you need a recursive approach:

WITH RECURSIVE cte 
 (
   patid, 
   epi_start, 
   epi_end,
   potential_end,
   rn, 
   epi_num
 )
AS
 (
   SELECT patid, 
      eventdate AS epi_start, 
      eventdate + 13  AS epi_end,
      eventdate + 13 + 14 AS potential_end,
      rn, 
      CAST(1 AS SMALLINT) AS epi_num
   FROM ep
   WHERE rn = 1

   UNION ALL

   SELECT
      cte.patid,
      CASE WHEN ep.eventdate <= cte.epi_start + 13
           THEN cte.epi_start
           WHEN ep.eventdate <= cte.potential_end 
           THEN cte.epi_start
           ELSE ep.eventdate
      END AS epi_start_new,

      CASE WHEN ep.eventdate <= cte.epi_start + 13
           THEN cte.epi_end
           ELSE ep.eventdate + 13
      END AS epi_end_new,

      CASE WHEN ep.eventdate <= cte.epi_start + 13
           THEN cte.potential_end
           WHEN ep.eventdate <= cte.potential_end
           THEN ep.eventdate+ 13    
           ELSE ep.eventdate + 13 + 14
      END AS potential_end_new,  
      ep.rn,
      epi_num +
      CASE WHEN epi_start_new <> cte.epi_start THEN 1 ELSE 0 END
    FROM cte JOIN ep
     ON cte.patid = ep.patid
    AND cte.rn +1 = ep.rn
 ) 
SELECT patid, MIN(epi_start), MAX(epi_end), epi_num
FROM cte
GROUP BY patid, epi_num
ORDER BY 1,2

Upvotes: 1

Related Questions