Counting sequential events and counts of sequences SQL

I have a query that I built using an answer found here and it was very helpful. I have added some things to it to suit my needs. One of the things that I added was a ROW_NUMBER() in order to count how many times someone has been readmitted within 30 days over any time length. I have inserted the cte results into a temp table as suggested in the first answer and by a question that was posted here. This does not solve thought, the sequence length and sequence count issue.

This is the query:

-- CREATE TABLE TO STORE CTE RESULTS
DECLARE @PPR TABLE(
    VISIT1      VARCHAR(20) -- THIS IS A UNIQUE VALUE
    , READMIT   VARCHAR(20) -- THIS IS A UNIQUE VALUE
    , MRN       VARCHAR(10) -- THIS IS UNIQUE TO A PERSON
    , INIT_DISC DATETIME
    , RA_ADM    DATETIME
    , R1        INT
    , R2        INT
    , INTERIM1  VARCHAR(20)
    , RA_COUNT  INT
    , FLAG      VARCHAR(2)
);

-- THE CTE THAT WILL GET USED TO POPULATE THE ABOVE TABLE
WITH cte AS (
  SELECT PTNO_NUM
    , Med_Rec_No
    , Dsch_Date
    , Adm_Date
    , ROW_NUMBER() OVER (
                         PARTITION BY MED_REC_NO 
                         ORDER BY PtNo_Num
                         ) AS r

  FROM smsdss.BMH_PLM_PtAcct_V

  WHERE Plm_Pt_Acct_Type = 'I'
  AND PtNo_Num < '20000000' 
  )

-- INSERT CTE RESULTS INTO PPR TABLE
INSERT INTO @PPR
SELECT
c1.PtNo_Num                                AS [INDEX]
, c2.PtNo_Num                              AS [READMIT]
, c1.Med_Rec_No                            AS [MRN]
, c1.Dsch_Date                             AS [INITIAL DISCHARGE]
, c2.Adm_Date                              AS [READMIT DATE]
, C1.r
, C2.r
, DATEDIFF(DAY, c1.Dsch_Date, c2.Adm_Date) AS INTERIM1
, ROW_NUMBER() OVER (
                    PARTITION BY C1.MED_REC_NO 
                    ORDER BY C1.PTNO_NUM ASC
                    ) AS [RA COUNT]

, CASE 
    WHEN DATEDIFF(DAY, c1.Dsch_Date, c2.Adm_Date) <= 30 
    THEN 1 
    ELSE 0
  END [FLAG]

FROM cte       C1
INNER JOIN cte C2
ON C1.Med_Rec_No = C2.Med_Rec_No

WHERE C1.Adm_Date <> C2.Adm_Date
AND C1.r + 1 = C2.r

ORDER BY C1.Med_Rec_No, C1.Dsch_Date

-- MANIPULATE PPR TABLE
SELECT PPR.VISIT1
, PPR.READMIT
, PPR.MRN
, PPR.INIT_DISC
, PPR.RA_ADM
--, PPR.R1
--, PPR.R2
, PPR.INTERIM1
--, PPR.RA_COUNT
, PPR.FLAG
-- THE BELOW DOES NOT WORK AT ALL
, CASE
    WHILE (SELECT PPR.INTERIM1 FROM @PPR PPR) <= 30
    BEGIN
        ROW_NUMBER() OVER (PARTITION BY PPR.MRN, PPR.VISIT1
                           ORDER BY PPR.VISIT1
                           )
        IF (SELECT PPR.INTERIM1 FROM @PPR PPR) > 30
            BREAK
    END
  END


FROM @PPR PPR

WHERE PPR.MRN = 'A NUMBER'

Example of current output:

INDEX | READMIT | MRN | INIT DISCHARGE | RA DATE   | INTERIM | RACOUNT | FLAG | FLAG_2
12345 | 12349   | 123 | 2005-07-05     | 2005-07-09| 4       | 1       | 1    | 0
12349 | 12351   | 123 | 2005-07-11     | 2005-07-15| 4       | 2       | 1    | 0

So the third line is obviously not a readmit in 30 days but just a point in time where the patient came back to the hospital so the RA_Count goes back to 1 and the flag goes to 0 because it is not a 30day readmit.

Should I create a table instead of using a cte?

What I would like to add is a Chain Length and a Chain Count. Here are some definitions:

Chain Length: How many times in a row has someone been readmitted within 30 days of subsequent visits.

For example

INDEX | READMIT | MRN  | INITIAL DISCHARGE    | READMIT DATE | CHAIN LEN | Count
123   | 133     | 1236 | 2009-05-13           | 2009-06-12   | 1         | 1
133   | 145     | 1236 | 2009-06-16           | 2009-07-04   | 2         | 1
145   | 157     | 1236 | 2009-07-06           | 2009-07-15   | 3         | 1
165   | 189     | 1236 | 2011-01-01           | 2011-01-12   | 1         | 2
189   | 195     | 1236 | 2011-02-06           | 2011-03-01   | 2         | 2

Chain count would then be how many chains are there: so in the above table there would be 2. I am trying to use the case statement to make the chain length

Here is an SQL Fiddle with some sample data as it will appear before the CTE is executed SQL Fiddle

Thank you,

Upvotes: 2

Answers (3)

JBrooks

Reputation: 10013

The solution is to start with a view that summarizes the data for each VisitID. I use a view for this instead of CTE because it seems like something you are going to us in more than 1 time.

    create view vReadmits
    as
    select t.VisitID,
    t.UID,
    min(r.AdmitDT) ReadmittedDT, 
    min(r.VisitID) NextVisitID,
    sum(case when r.AdmitDT < dateadd(d, 30, isnull(t.DischargeDT, t.AdmitDT)) 
        then 1 else 0 end) ReadmitNext30
    from t 
    left join t as r
    on t.UID = r.UID
    and t.VisitID < r.VisitID
    group by t.VisitID,
    t.UID

This takes each VisitID and finds the next VisitID for that UID. At the same time it sums up the future visits that are less then 30 days. It uses ISNULL() to account for the missing DischargeDTs.

You can then add the logic for Chains in a CTE. Then you can join to the view and CTE to include the columns in the view.

with Chains as
(
select v.UID,
sum(case when r.ReadmittedDT < dateadd(d, 30, v.ReadmittedDT)
then 0 else 1 end) as ChainCount    
from vReadmits v
left join vReadmits r
on r.NextVisitID = v.VisitID
group by v.UID
)

select t.UID, 
t.VisitId, 
t.AdmitDT, 
t.DischargeDT, 
v.NextVisitID, 
v.ReadmitNext30, 
v.ReadmittedDT,
c.ChainCount
from t 
join vReadmits v
on t.VisitID = v.VisitID
inner join Chains c
on v.UID = c.UID
order by t.UID, t.VisitID

Here is the SQLFiddle

The assumption I made was that if the VisitID was greater than another, then its AdmitDT would be greater too. This should be the case (especially for the same UID), but if not you would change the view to use AdmitDTs instead of VisitID.

Upvotes: 2

Bogdan Sahlean

Reputation: 1

Update #1: Two events are linked if maximum difference between them is 30 days. [COUNT] values are generated per person.

You could adapt following example which use a recursive common table expression:

CREATE TABLE dbo.Events (
    EventID INT IDENTITY(1,1) PRIMARY KEY,
    EventDate DATE NOT NULL,
    PersonID INT NOT NULL
);
GO
INSERT dbo.Events (EventDate, PersonID)
VALUES 
    ('2014-01-01', 1), ('2014-01-05', 1), ('2014-02-02', 1), ('2014-03-30', 1), ('2014-04-04', 1), 
    ('2014-01-11', 2), ('2014-02-02', 2),
    ('2014-01-03', 3), ('2014-03-03', 3);
GO

DECLARE @EventsWithNum TABLE (
    EventID INT NOT NULL,
    EventDate DATE NOT NULL,
    PersonID INT NOT NULL,
    EventNum INT NOT NULL,
    PRIMARY KEY (EventNum, PersonID)
);
INSERT  @EventsWithNum
SELECT  crt.EventID, crt.EventDate, crt.PersonID,
        ROW_NUMBER() OVER(PARTITION BY crt.PersonID ORDER BY crt.EventDate, crt.EventID) AS EventNum
FROM    dbo.Events crt;

WITH CountingSequentiaEvents
AS (
    SELECT  crt.EventID, crt.EventDate, crt.PersonID, crt.EventNum,
            1 AS GroupNum,
            1 AS GroupEventNum
    FROM    @EventsWithNum crt
    WHERE   crt.EventNum = 1

    UNION ALL 

    SELECT  crt.EventID, crt.EventDate, crt.PersonID, crt.EventNum,
            CASE 
                WHEN DATEDIFF(DAY, prev.EventDate, crt.EventDate) <= 30 THEN prev.GroupNum
                ELSE prev.GroupNum + 1 
            END AS GroupNum,
            CASE 
                WHEN DATEDIFF(DAY, prev.EventDate, crt.EventDate) <= 30 THEN prev.GroupEventNum + 1
                ELSE 1 
            END AS GroupEventNum
    FROM    @EventsWithNum crt JOIN CountingSequentiaEvents prev ON crt.PersonID = prev.PersonID
    AND     crt.EventNum = prev.EventNum + 1
)
SELECT  x.EventID, x.EventDate, x.PersonID,
        x.GroupEventNum AS [CHAIN LEN],
        x.GroupNum AS [Count]
FROM    CountingSequentiaEvents x
ORDER BY x.PersonID, x.EventDate
-- 1000 means 1000 + 1 = maximum 1001 events / person
OPTION (MAXRECURSION 1000); -- Please read http://msdn.microsoft.com/en-us/library/ms175972.aspx (section Guidelines for Defining and Using Recursive Common Table Expressions)

Output:

EventID EventDate  PersonID CHAIN LEN Count
------- ---------- -------- --------- -----
1       2014-01-01 1        1         1
2       2014-01-05 1        2         1
3       2014-02-02 1        3         1
------- ---------- -------- --------- -----
4       2014-03-30 1        1         2
5       2014-04-04 1        2         2
------- ---------- -------- --------- -----
6       2014-01-11 2        1         1
7       2014-02-02 2        2         1
------- ---------- -------- --------- -----
8       2014-01-03 3        1         1
------- ---------- -------- --------- -----
9       2014-03-03 3        1         2
------- ---------- -------- --------- -----

As you can see

enter image description here

the execution plan contains, for the last statement, two Index Seek operators because this constraint PRIMARY KEY (EventNum, PersonID) defined on @EventsWithNum forces SQL Server to create (in this case) a clustered index with a compound key EventNum, PersonID.

Also, we can see that the estimate cost for INSERT @EventsWithNum ... is greater than the estimated cost for WITH CountingSequentiaEvents (...) SELECT ... FROM CountingSequentiaEvents ....

Upvotes: 4

Chuck Buford

Reputation: 379

There's one drawback to remember about CTE's.

Every time you reference them, it re-executes the query.

In your query, you are referencing CTE twice (c1 and c2). This means that it is executing the query twice. You would be better off storing the CTE results in a table variable or temp table, then performing your joins.

Upvotes: 2

Counting sequential events and counts of sequences SQL

Answers (3)

Related Questions