Batman
Batman

Reputation: 8917

Aggregate/Windowed Function To Find Min and Max of Sequential Rows

I've got a SQL table where I want to find the first and last dates of a group of records, providing they're sequential.

 Patient | TestType | Result |     Date
------------------------------------------
    1    |    1     |   A    |  2012-03-04
    1    |    1     |   A    |  2012-08-19
    1    |    1     |   B    |  2013-05-27
    1    |    1     |   A    |  2013-06-20
    1    |    2     |   X    |  2012-08-19
    1    |    2     |   X    |  2013-06-20
    2    |    1     |   B    |  2014-09-09
    2    |    1     |   B    |  2015-04-19

Should be returned as

Patient | TestType | Result | StartDate   |   EndDate
--------------------------------------------------------
   1    |    1     |   A    |  2012-03-04 | 2012-08-19 
   1    |    1     |   B    |  2013-05-27 | 2013-05-27
   1    |    1     |   A    |  2013-06-20 | 2013-06-20 
   1    |    2     |   X    |  2012-08-19 | 2013-06-20
   2    |    1     |   B    |  2014-09-09 | 2015-04-19

The problem is that if I just group by Patient, TestType, and Result, then the first and third rows in the example above would become a single row.

Patient | TestType | Result | StartDate   |   EndDate
--------------------------------------------------------
   1    |    1     |   A    |  2012-03-04 | 2013-06-20 
   1    |    1     |   B    |  2013-05-27 | 2013-05-27
   1    |    2     |   X    |  2012-08-19 | 2013-06-20
   2    |    1     |   B    |  2014-09-09 | 2015-04-19

I feel like there's got to be something clever I can do with a partition, but I can't quite figure out what it is.

Upvotes: 1

Views: 84

Answers (3)

Steve Kass
Steve Kass

Reputation: 7184

See if this gives you what you need.

with T1 as (
  select
    *,
    case when lag(Patient,1)
           over (order by Patient, TestType, Result) = Patient
          and lag(TestType,1) 
           over (order by Patient, TestType, Result) = TestType
          and lag(Result,1)
           over (order by Patient, TestType, Result) = Result
    then null else 1 end as Changes
  from t
), T2 as (
  select
    Patient,
    TestType,
    Result,
    dt,
    sum(Changes) over (
      order by Patient, TestType, Result, dt
    ) as seq
  from T1
)
  select
    Patient,
    TestType,
    Result,
    min(dt) as dtFrom,
    max(dt) as dtTo
  from T2
  group by Patient, TestType, Result, seq
  order by Patient, TestType, Result

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269683

There are several ways to approach this. I like identifying the groups using the difference of row number values:

select patient, testtype, result,
       min(date) as startdate, max(date) as enddate
from (select t.*, 
             (row_number() over (partition by patient, testtype order by date) -
              row_number() over (partition by patient, testtype, result order by date)
             ) as grp
      from table t
     ) t
group by patient, testtype, result, grp
order by patient, startdate;

Upvotes: 2

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

select patient, testtype, result, date as startdate, 
isnull(lead(date) over(partition by patient, testtype, result order by date), date) as enddate
from tablename;

You can use lead function to get the value of date (as enddate) from the next row in each group.

SQL Fiddle with sample data.

Upvotes: 0

Related Questions