Reputation: 23
Working with healthcare data in SQL and I'm trying to fill in records when there are overlapping dates for thousands of records. This mostly occurs when a patient is in a nursing home and then transferred to a hospital. They have not been discharged yet from the nursing home because their bed is being kept available for them, so that's why the discharge date is missing. Here's one example of what the records look like:
Patient | Admission_Date | Discharge_Date | Location |
---|---|---|---|
ABC | 1/2/2021 | MISSING | NURSING HOME |
ABC | 2/3/2021 | 2/4/2021 | ICU |
ABC | 4/10/2021 | 4/13/2021 | HOSPITAL |
Here's what I need it to look like:
Patient | Admission_Date | Discharge_Date | Location |
---|---|---|---|
ABC | 1/2/2021 | 2/3/2021 | NURSING HOME |
ABC | 2/3/2021 | 2/4/2021 | ICU |
ABC | 2/4/2021 | 4/10/2021 | NURSING HOME |
ABC | 4/10/2021 | 4/13/2021 | HOSPITAL |
ABC | 4/13/2021 | MISSING | NURSING HOME |
Any ideas on how to get this done?
I've tried using a combination of multiple CASE WHEN statements and lag/leads but it's obviously very inefficient and difficult to account for all of the possible records between nursing home stays.
Upvotes: 1
Views: 88
Reputation: 5
For the exact example from above the solution is here. I haven't tested it for multiple patients or multiple missing dates. But I can, if it's not enough.
CREATE TABLE Patient_Locations (
Patient nvarchar(100) NOT NULL
, Admission_Date date NOT NULL
, Discharge_Date date NULL
, Location nvarchar(100)
)
INSERT INTO Patient_Locations (Patient, Admission_Date, Discharge_Date, Location)
VALUES ('ABC', '20210102', NULL, 'NURSING HOME')
, ('ABC', '20210203', '20210204', 'ICU')
, ('ABC', '20210410', '20210413', 'HOSPITAL')
;WITH Next_Known_Dates (Patient, Admission_Date, Discharge_Date, Location, NEXT_Admission_Date)
AS (
SELECT Patient
, Admission_Date
, Discharge_Date
, Location
, NEXT_Admission_Date = LEAD(Admission_Date, 1) OVER (Partition by Patient ORDER BY Admission_Date)
FROM Patient_Locations
)
-- First row where Discharge_Date is Missing. Replace by next known period
SELECT Patient
, Admission_Date
, Discharge_Date = NEXT_Admission_Date
, Location
FROM Next_Known_Dates
WHERE Discharge_Date IS NULL
UNION ALL
-- Just normal rows
SELECT Patient
, Admission_Date
, Discharge_Date
, Location
FROM Patient_Locations
WHERE Discharge_Date IS NOT NULL
UNION ALL
--Little magic. Replace missing dates with 'normal' Discharge_date and NEXT_Admission_Date
SELECT Patient_Locations.Patient
, Next_Known_Dates.Discharge_Date
, Next_Known_Dates.NEXT_Admission_Date
, Patient_Locations.Location
FROM Patient_Locations
JOIN Next_Known_Dates
ON Next_Known_Dates.Patient = Patient_Locations.Patient
AND Next_Known_Dates.Admission_Date > Patient_Locations.Admission_Date
WHERE Patient_Locations.Discharge_Date IS NULL
AND Next_Known_Dates.Discharge_Date IS NOT NULL
ORDER BY 1, 2, 3
You can try it here: https://sqlize.online/sql/mssql2022/14e363a81956b2e9019fc44daf47a8c7/
Look at this select for the Idea of a new column NEXT_Admission_Date:
SELECT Patient
, Admission_Date
, Discharge_Date
, Location
, NEXT_Admission_Date = LEAD(Admission_Date, 1) OVER (Partition by Patient ORDER BY Admission_Date)
FROM Patient_Locations
Upvotes: 0