neuropsych
neuropsych

Reputation: 305

Efficiently identify all FK items with n>3 dates within any 8 week period from a SQL table?

I have a ~400,000 row table containing the dates at which a collection of ~30,000 people had appointments. Each row has the patient ID number and an appointment date. I want to efficiently select people who had at least 4 appointments in an 8 week span. Ideally, I would also flag the appointments that were within this 8 week span as I did so. I am working in a server environment that does not allow CLR aggregate functions. Is this possible to do in SQL server? If so, how?

What I've thought about:

For clarity of discussion, here is some notation:

MyTable:

ApptID    PatientID    ApptDate (in smalldatetime)
--------------------------------------------------
Apt1       Pt1          Datetime1
Apt2       Pt1          Datetime2
Apt3       Pt2          Datetime3
...       ...          ...

Desired output (one option):

PatientID    4aptsIn8weeks? (Boolean)  InitialApptDateForWin
Pt1          1                         Datetime1
Pt2          0                         NULL
Pt3          1                         Datetime3
...

Desired output (another option):

ApptID     PatientID    ApptDate     InAn8wkWindow?   InitialApptDateForWin
Apt1       Pt1          Datetime1    1                Datetime1
Apt2       Pt1          Datetime2    1                Datetime1
Apt3       Pt2          Datetime3    0                NULL
...       ...          ...

But really, any output format that will in the end let me select patients and appointments that meet this criterion would be dandy....

Thanks for any ideas!


EDIT: Here's a slightly decompressed outline of my implementation of the selected answer below, just in case the details are helpful for anyone else (being new to SQL, it took me a couple stabs to get it working):

WITH MyTableAlias AS (
    SELECT * FROM MyTable
    )
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1, 
    MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
   MyTable.PatientID = MyTableAlia.PatientID 
   AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
   );

-- Since this gives for any given two visit dates 3 hits 
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being 
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;

-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being 
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first.  Now we want to look 
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
    AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;

-- From here on it's just a matter of how I want to handle patients with two        
-- separate V1 examples meeting criteria...

Upvotes: 0

Views: 39

Answers (1)

Salman Arshad
Salman Arshad

Reputation: 272316

Rough outline of the query:

  • INNER JOIN the table ("table") with itself ("alias"), the ON clause would be:
    • table.patientid = alias.patientid
    • table.appointment_date < alias.appointment_date
    • datediff(table.appointment_date, alias.appointment_date) <= 8 week
  • Then GROUP BY table.patientid, table.appointment_date
  • Output table.patientid, table.appointment_date, MAX(alias.appointment_date), COUNT(*)
  • Add a HAVING COUNT(*) > n clause

There are some issues though:

  1. With 400,000 rows the JOIN could produce a very large result set
  2. It will count some date ranges twice. E.g. if there were 4 visits in 9 week period then it will return two rows (#1, #2, #3 and #2, #3, #4).

Upvotes: 0

Related Questions