Reputation: 706
I have a table of patients of whom some have one or more 'events' (basically dates). I need to process these dates into episodes using the following steps:-
Following an event, project forwards 14 days and use as a potential end-of-episode date.
a. Events within this period are ignored, assumed related to the initial events.If there are no events in the 14 days after the potential end-of-episode date, halt the algorithm - the episode has been fully identified.
If there is an event in the 14 days after the potential end-of-episode date, move potential end-of-episode date to new event date + 14 days.
Repeat from step 2 until there is 14 days from the end-of-episode date clear of events.;
Below is some test data. The 'real' data is patid (patient identifier), eventdate and rn (a unique row number, reset to 1 for each patient, applied in date order). Exp_end is my own entry to indicate the end date of episode. It's on the row that would be the last event for the episode:-
"patid","rn", "eventdate", "exp_end"
1, 1, 01-01-2010, 14-01-2010
1, 2, 11-02-2010, 24-02-2010
2, 1, 01-01-2010,
2, 2, 07-01-2010,
2, 3, 12-01-2010, 14-01-2010
2, 4, 21-02-2010, 06-03-2010
3, 1, 01-01-2010,
3, 2, 12-01-2010,
3, 3, 24-01-2010, 06-02-2010
3, 4, 21-02-2010, 06-03-2010
4, 1, 01-01-2010,
4, 2, 24-01-2010,
4, 3, 28-01-2010,
4, 4, 04-02-2010, 17-02-2010
4, 5, 21-02-2010, 06-03-2010
5, 1, 26-02-2004, 10-03-2004
5, 2, 03-09-2004,
5, 3, 09-09-2004,
5, 4, 21-09-2004, 04-10-2004
5, 5, 05-10-2004,
5, 6, 14-10-2004,
5, 7, 28-10-2004, 10-11-2004
The SQL code produces the same table:-
CREATE TABLE dp.ep( eventdate DATE FORMAT 'YY/MM/DD', exp_end DATE FORMAT 'YY/MM/DD', patid INTEGER, rn INTEGER) PRIMARY INDEX pats ( patid );
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 2.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-04','YYYY-MM-DD'), TO_DATE('2010-02-17','YYYY-MM-DD'), 4.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-12','YYYY-MM-DD'), TO_DATE('2010-01-14','YYYY-MM-DD'), 2.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 3.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-24','YYYY-MM-DD'), TO_DATE('2010-02-06','YYYY-MM-DD'), 3.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-24','YYYY-MM-DD'), NULL, 4.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 3.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-11','YYYY-MM-DD'), TO_DATE('2010-02-24','YYYY-MM-DD'), 1.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 4.0, 5.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-07','YYYY-MM-DD'), NULL, 2.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), NULL, 4.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-12','YYYY-MM-DD'), NULL, 3.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-28','YYYY-MM-DD'), NULL, 4.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-02-21','YYYY-MM-DD'), TO_DATE('2010-03-06','YYYY-MM-DD'), 2.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2010-01-01','YYYY-MM-DD'), TO_DATE('2010-01-14','YYYY-MM-DD'), 1.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-02-26','YYYY-MM-DD'), TO_DATE('2004-03-10','YYYY-MM-DD'), 5.0, 1.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-03','YYYY-MM-DD'), NULL, 5.0, 2.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-09','YYYY-MM-DD'), NULL, 5.0, 3.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-09-21','YYYY-MM-DD'), TO_DATE('2004-10-04','YYYY-MM-DD'), 5.0, 4.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-05','YYYY-MM-DD'), NULL, 5.0, 5.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-14','YYYY-MM-DD'), NULL, 5.0, 6.0);
INSERT INTO db.ep (eventdate, exp_end, patid, rn) VALUES(TO_DATE('2004-10-28','YYYY-MM-DD'), TO_DATE('2004-11-10','YYYY-MM-DD'), 5.0, 7.0);
Summarily, these are the results I'm hoping for:-
patid epi_start epi_end epi_num
1 01JAN2010 14JAN2010 1
11FEB2010 24FEB2010 2
2 01JAN2010 14JAN2010 1
21FEB2010 06MAR2010 2
3 01JAN2010 06FEB2010 1
21FEB2010 06MAR2010 2
4 01JAN2010 17FEB2010 1
21FEB2010 06MAR2010 2
5 26FEB2004 10MAR2004 1
03SEP2004 04OCT2004 2
05OCT2004 10NOV2004 3
( Of course, the help I need is getting to the table above - I can process further to get this type of output.)
I can get the results easily enough in SAS but with SQL, I'm tearing my hair out! I imagine it needs recursive CTE SQL, something which I'm not great at...
I'm using Teradata SQL, by the way.
Any help will be great.
Upvotes: 0
Views: 261
Reputation: 60462
If I understood correctly your rules can be rephrased as following:
All rows within 4 weeks after the initial event (rn=1) belong to the first episode.
After 4 weeks a new episode only starts when the gap between the current and the previous row is greater than 2 weeks.
Now it's no longer an iterative algorithm. Of course this can be done using a Recursive CTE, but I prefer Windowed Aggregates :-)
This seems to return the correct result:
SELECT patid,
MIN(eventdate) AS epi_start,
CASE
WHEN MAX(eventdate) > MIN(eventdate) + 13 -- more than two weeks since start
THEN MAX(eventdate) + 13 -- adjust end to two weeks after latest date
ELSE MIN(eventdate) + 13 -- two weeks after start
END AS epi_end,
epi_num
FROM
(
SELECT dt.*,
SUM(new_epi) -- calculate the episode number
OVER (PARTITION BY patid
ORDER BY rn
ROWS UNBOUNDED PRECEDING) + 1 AS epi_num
FROM
(
SELECT ep.*,
-- less than 4 weeks after the initial date
CASE WHEN eventdate < MIN(eventdate) OVER (PARTITION BY patid) + 28
-- less than 2 weeks after the previous date
OR eventdate < MAX(eventdate) -- previous date (equivalent to LAG(eventdate)
OVER (PARTITION BY patid
ORDER BY rn
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 14
THEN 0 -- same episode
ELSE 1 -- new episode starts
END AS new_epi
FROM ep
) AS dt
) AS dt
GROUP BY patid, epi_num
ORDER BY 1,4
If rn
is created by a ROW_NUMBER
you might get rid of that calculation by using ORDER BY eventdate
instead of rn
(if it's unique per patid
)
Edit:
I didn't get your logic correctly, it can't be simplified, so it can't be done with OLAP, you need a recursive approach:
WITH RECURSIVE cte
(
patid,
epi_start,
epi_end,
potential_end,
rn,
epi_num
)
AS
(
SELECT patid,
eventdate AS epi_start,
eventdate + 13 AS epi_end,
eventdate + 13 + 14 AS potential_end,
rn,
CAST(1 AS SMALLINT) AS epi_num
FROM ep
WHERE rn = 1
UNION ALL
SELECT
cte.patid,
CASE WHEN ep.eventdate <= cte.epi_start + 13
THEN cte.epi_start
WHEN ep.eventdate <= cte.potential_end
THEN cte.epi_start
ELSE ep.eventdate
END AS epi_start_new,
CASE WHEN ep.eventdate <= cte.epi_start + 13
THEN cte.epi_end
ELSE ep.eventdate + 13
END AS epi_end_new,
CASE WHEN ep.eventdate <= cte.epi_start + 13
THEN cte.potential_end
WHEN ep.eventdate <= cte.potential_end
THEN ep.eventdate+ 13
ELSE ep.eventdate + 13 + 14
END AS potential_end_new,
ep.rn,
epi_num +
CASE WHEN epi_start_new <> cte.epi_start THEN 1 ELSE 0 END
FROM cte JOIN ep
ON cte.patid = ep.patid
AND cte.rn +1 = ep.rn
)
SELECT patid, MIN(epi_start), MAX(epi_end), epi_num
FROM cte
GROUP BY patid, epi_num
ORDER BY 1,2
Upvotes: 1