JBritton
JBritton

Reputation: 113

SQL: Recursive CTE by grouped date range

I am fairly new to SQL and I am having trouble assigning points to patients based on their visit date. I think a Recursive CTE is the best way to achieve this but I cannot wrap my head around actually writing it.

Each OrganizationMrn should be assigned 1 point for each visit EXCEPT when there has been another visit within the past 3 days. If the patient has multiple visits in 3 days, one point should be assigned.

Any help modifying my query to a recursive cte that assigns points as described would be greatly appreciated.

Examples:
A patient has these 7 visits: 1/1, 1/2, 1/3 1/4, 1/5, 1/6, 1/11. This patient should be assigned 3 points: 1 point (1/1-1/4), 1 point (1/5-1/6), 1 point (1/11).

Query explanation: Patient 25 should have 5 total points;
Date range 2015-10-02 - 2015-10-05 should be assigned 1 point;
Date range 2015-11-08 - 2015-11-09 should be assigned 1 point;
The other dates do not have another date within 3 days, they should be assigned 1 point.

WITH CTE AS
    (
    SELECT 
    ROW_NUMBER() OVER (PARTITION BY OrganizationMrn ORDER BY [Date]) AS ROWNUMBER, *
    FROM #RC
    )
    SELECT *, 
    ISNULL(DATEDIFF(DY,(SELECT OTHER.[Date] FROM CTE OTHER WHERE OTHER.OrganizationMrn = CTE.OrganizationMrn AND OTHER.ROWNUMBER = CTE.ROWNUMBER - 1), CTE.[Date]),0) AS DaysFromLastVisit, 
    CASE WHEN ISNULL(DATEDIFF(DY,(SELECT OTHER.[Date] FROM CTE OTHER WHERE OTHER.OrganizationMrn = CTE.OrganizationMrn AND OTHER.ROWNUMBER = CTE.ROWNUMBER - 1), CTE.[Date]),0) > 3 THEN 1 END AS POINTS
    FROM CTE
    ORDER BY OrganizationMrn, [Date];

    ROWNUMBER   OrganizationMrn Date        DaysFromLastVisit   POINTS
    1           25              2015-10-02  0                   NULL
    2           25              2015-10-03  1                   NULL
    3           25              2015-10-05  2                   NULL
    4           25              2015-11-08  34                  1
    5           25              2015-11-09  1                   NULL
    6           25              2016-03-04  116                 1
    7           25              2016-05-04  61                  1
    8           25              2016-05-10  6                   1

This is how #RC is being populated:

SELECT I.OrganizationMrn, CAST(R.DTTM AS DATE) AS Date
INTO #RC
FROM Standard SD 
    INNER JOIN Iorg I ON I.Person = SD.Patient
    INNER JOIN Result R ON I.Person = R.Patient
WHERE 
    R.Entry = 'note'
AND R.DTTM >= DATEADD(M,-12,GETDATE()) 
AND OrganizationMrn = '25'
ORDER BY I.OrganizationMrn;

OrganizationMrn Date
25              2015-10-02
25              2015-10-03
25              2015-10-05
25              2015-11-08
25              2015-11-09
25              2016-03-04
25              2016-05-04
25              2016-05-10

How can I modify this CASE statement to only assign points to one of of the 3 dates? It currently is assigning points to each day, 10/2, 10/3, 10/5.

CASE 
WHEN ISNULL(DATEDIFF(DY,(SELECT OTHER.[Date] FROM CTE OTHER WHERE OTHER.OrganizationMrn = CTE.OrganizationMrn AND OTHER.ROWNUMBER = CTE.ROWNUMBER - 1), CTE.[Date]),0) <= 3 
THEN 1 ELSE 0 END AS POINTS

Upvotes: 3

Views: 1342

Answers (3)

Terry Truong
Terry Truong

Reputation: 321

I am also new to SQL land. I've found this article from https://bertwagner.com/posts/gaps-and-islands, explaining pretty well how to solve this "gaps & islands problem".

Here is his sample SQL that is working for me:

DROP TABLE IF EXISTS #OverlappingDateRanges;
CREATE TABLE #OverlappingDateRanges (StartDate date, EndDate date);

INSERT INTO #OverlappingDateRanges
SELECT '8/24/2017', '9/23/2017'  UNION ALL
SELECT '8/24/2017', '9/20/2017'  UNION ALL 
SELECT '9/23/2017', '9/27/2017'  UNION ALL 
SELECT '9/25/2017', '10/10/2017' UNION ALL
SELECT '10/17/2017','10/18/2017' UNION ALL 
SELECT '10/25/2017','11/3/2017'  UNION ALL 
SELECT '11/3/2017', '11/15/2017'


SELECT
    MIN(StartDate) AS IslandStartDate,
    MAX(EndDate) AS IslandEndDate
FROM (
    SELECT
        *,
        CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END AS IslandStartInd,
        SUM(CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
    FROM (
        SELECT
            ROW_NUMBER() OVER(ORDER BY StartDate,EndDate) AS RN,
            StartDate,
            EndDate,
            LAG(EndDate,1) OVER (ORDER BY StartDate, EndDate) AS PreviousEndDate
        FROM
            #OverlappingDateRanges
    ) Groups
) Islands
GROUP BY
    IslandId
ORDER BY 
    IslandStartDate

Upvotes: 1

Kamil Gosciminski
Kamil Gosciminski

Reputation: 17137

Straightforward logic getting first previous date into the current row using lag() window function and then based on the output assigning points or not with conditional sum().

Assign a point when previous date doesn't fall between date and 3 days behind or when there is no previos date to an event:

select 
  organizationmrn, 
  sum(case when 
        prev_date not between dateadd(day, -3, date) and date 
        or prev_date is null 
      then 1 else 0 end) as points
from (
  select 
    *, 
    lag(date,1) over (partition by organizationmrn order by date) as prev_date
  from rc 
  ) calculate_prev_date
group by organizationmrn

Result

 organizationmrn | points
-----------------+--------
              25 |      5

Upvotes: 1

dazedandconfused
dazedandconfused

Reputation: 3186

Do you need to output each individual group or is it enough to know the point value? If the latter, you can think of this as a variation of the "gaps and islands" problem. There is an excellent article here if you want a deep dive. I'm adapting one of the code snippets from that page here.

Define a starting point as a record that has no records within 3 days prior to it. An end point is a record that has no records within 3 days after. Once each island has been identified, we can take the number of days between the starting and ending point, and determine how many 3 day groups fit within it by dividing and rounding the answer up. Note: Code below is hard-coded for organization 1.

CREATE TABLE #t(
    OrganizationMrn int,
    VisitDate date)

INSERT #t(OrganizationMrn, VisitDate) VALUES 
    (1, '1/1/2016'),
    (1, '1/2/2016'),
    (1, '1/3/2016'),
    (1, '1/4/2016'),
    (1, '1/5/2016'),
    (1, '1/6/2016'),
    (1, '1/11/2016')

;WITH StartingPoints AS (
    SELECT VisitDate, ROW_NUMBER() OVER (ORDER BY VisitDate) AS Sequence FROM #t AS A 
    WHERE A.OrganizationMrn = 1 AND NOT EXISTS (
        SELECT * FROM #t AS B 
        WHERE B.OrganizationMrn = A.OrganizationMrn AND 
            B.VisitDate >= DATEADD(day, -4, A.VisitDate)
            AND 
            B.VisitDate < A.VisitDate
        )
),
EndingPoints AS (
    SELECT VisitDate, ROW_NUMBER() OVER (ORDER BY VisitDate) AS Sequence FROM #t AS A 
    WHERE A.OrganizationMrn = 1 AND NOT EXISTS (
        SELECT * FROM #t AS B 
        WHERE B.OrganizationMrn = A.OrganizationMrn AND 
            B.VisitDate <= DATEADD(day, 4, A.VisitDate)
            AND 
            B.VisitDate > A.VisitDate
        )
)

SELECT 
     S.VisitDate AS StartDate
    ,E.VisitDate AS EndDate 
    ,CEILING((DATEDIFF(day, S.VisitDate, E.VisitDate) + 1) / 4.0) AS Points
FROM 
    StartingPoints AS S 
    JOIN EndingPoints AS E ON (E.Sequence = S.Sequence)

Upvotes: 1

Related Questions