Combining WITH, UNION and OR in JOIN

Question

I have a national database of all hospital records, and another national database of infection events. I am looking to extract all relevant hospital events for the infection, but I am struggling to find a way to optmise the query. OtherData is a proxy for another set of columns.

The infection data looks like this:

UniqueID	PatientNumber	HospitalNumber	Date	OtherData
14000000	1234	BAC	2022-01-27	DELTA
12007927	5412	HSA	2022-01-20	OMICRON
1	7862	UDO	2020-02-01	ALPHA

The hospital data looks like this:

EpisodeID	PatientNumber	HospitalNumber	StartDate	EndDate	OtherData
4	1234		2022-01-25	NA	ICU
987213	5412		2022-01-20	2022-01-27	DIED
3		BAC	2021-11-20	2022-01-20	DISCHARGED
3		BAC	2020-01-29	2022-02-10	DISCHARGED

The data can be missing lots of fields, and I have two identifiers (national and local) I can use to link the data. I query against both using UNION. But because these are National registers, and I'm dealing with Covid-19 data, we are talking about linking 10s of millions of records (on seperate servers). In order to minimise the amount of hospital data pulled in I am attempting to link between date ranges of the infection.

My query as it stands. It took 8 min pulling a single UniqueID when I tested it, and I have 10s of millions. Im not sure if I should be using and AND (X OR Y) with the dates in the INNER JOIN or if I should have them seperated, and use two more UNIONs.

This data is ingested into R for further processing and analytics.

Help appreciated!

DECLARE @days AS INT = 28;

WITH 
infections AS (
SELECT  UniqueID
  ,PatientNumber
  ,HospitalNumber
  ,Date
  ,OtherData
FROM infections
),
link_tbl AS (
SELECT
  i.UniqueID
  ,h.EpisodeID
  ,h.PatientNumber
  ,h.HospitalNumber
  ,h.StartDate
  ,h.EndDate
FROM infections i
INNER JOIN hospital h
  ON i.NHSNumber = h.NHSNumber
  AND (h.EndDate
    BETWEEN CONVERT(date, DATEADD(DAY, -@days, i.Date)) 
    AND CONVERT(date, DATEADD(DAY, @days, i.Date))
  OR h.StartDate
    BETWEEN CONVERT(date, DATEADD(DAY, -@days, i.Date)) 
    AND CONVERT(date, DATEADD(DAY, @days, i.Date))
  )
UNION

SELECT
  i.UniqueID
  ,h.EpisodeID
  ,h.NHSNumber
  ,h.HospitalNumber
  ,h.StartDate
  ,h.EndDate
FROM infections i
INNER JOIN hospital h
  ON i.HospitalNumber = h.HospitalNumber
  AND (h.EndDate
    BETWEEN CONVERT(date, DATEADD(DAY, -@days, i.Date)) 
    AND CONVERT(date, DATEADD(DAY, @days, i.Date))
  OR h.StartDate
    BETWEEN CONVERT(date, DATEADD(DAY, -@days, i.Date)) 
    AND CONVERT(date, DATEADD(DAY, @days, i.Date))
  )
)

SELECT 
  hospital.allothervars* ---(these are named)
  ,infections.* ---named in query
FROM hospital
INNER JOIN link_tbl ON hospital.EpisodeID = link_tbl.EpisodeID

Combining WITH, UNION and OR in JOIN

Answers (1)

Related Questions