How do you solve this SQL query?

Question

In this puzzle we have to group data based on Patients admission date and discharge date. If any Patients discharge date + 1 = admission date then we have group both rows into one row and sum costs from both the rows. Please check out the sample input and expected output for details.

Sample Input
PatientID   AdmissionDate   DischargeDate   Cost
1009        27-07-2014      31-07-2014      1050
1009        01-08-2014      23-08-2014      1070
1009        31-08-2014      31-08-2014      1900
1009        01-09-2014      14-09-2014      1260
1009        01-12-2014      31-12-2014      2090
1024        07-06-2014      28-06-2014      1900
1024        29-06-2014      31-07-2014      2900
1024        01-08-2014      02-08-2014      1800

Expected Output
PatientId   AdminssionDate  DischargeDate   Cost
1009        27-07-2014      23-08-2014      2120
1009        31-08-2014      14-09-2014      3160
1009        01-12-2014      31-12-2014      2090
1024        07-06-2014      02-08-2014      6600

I am unable to come up with a solution for this.

Use this to generate the table:

CREATE TABLE PatientProblem
(
PatientID integer,
AdmissionDate date,
DischargeDate date,
Cost numeric(20,2)
);
 
--Insert Data
INSERT INTO PatientProblem(PatientID,AdmissionDate,DischargeDate
,Cost)
VALUES
(1009,'2014-07-27','2014-07-31',1050.00),
(1009,'2014-08-01','2014-08-23',1070.00),
(1009,'2014-08-31','2014-08-31',1900.00),
(1009,'2014-09-01','2014-09-14',1260.00),
(1009,'2014-12-01','2014-12-31',2090.00),
(1024,'2014-06-07','2014-06-28',1900.00),
(1024,'2014-06-29','2014-07-31',2900.00),
(1024,'2014-08-01','2014-08-02',1800.00)

Thorsten Kettner · Accepted Answer

This is called a gaps and islands problem. We usually solve this with window functions. With LAG you can see the previous row's values. This helps us to mark the rows that start a new date range per patient. By building a running count of the number of marks then, we get group numbers for the patients' date ranges.

select
  patientid,
  min(admissiondate) as range_start,
  max(dischargedate) as range_end,
  sum(cost) as total_cost
from
(
  select
    patientid, admissiondate, dischargedate, cost,
    count(marker) over (partition by patientid order by admissiondate) as grp
  from
  (
    select
      patientid, admissiondate, dischargedate, cost,
      case when admissiondate > lag(dischargedate) over (partition by patientid order by admissiondate) + interval '1 day' then
        1
      end as marker
    from mytable
  ) marked
) grouped
group by patientid, grp
order by patientid, grp;

Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=d2ee8fede9bb999ada79310047e5ae27

How do you solve this SQL query?

Answers (1)

Related Questions