Grouping together date periods based on two separate rankings/groupings

Question

Hi all sorry about my poorly worded title I am unsure as to how to phrase exactly what I need. But i will try and explain it better below:

I have a dataset that looks like this:

DECLARE @TestDATA TABLE (PERSON_ID int, START_DATE date, END_DATE date,SERVICE_RANK int)

INSERT INTO @TestDATA 
VALUES 
(123, '2018-01-31', '2018-02-14', 7), 
(123, '2018-03-28', '2018-04-11', 4), 
(123, '2018-04-12', '2018-04-30', 4), 
(123, '2018-05-25', '2018-06-08', 7), 
(123, '2018-06-08', '2018-06-15', 7), 
(123, '2018-06-19', '2018-06-26', 7), 
(123, '2018-06-26', '2018-09-28', 4), 
(123, '2018-10-10', '2018-11-07', 7), 
(123, '2018-11-27', '2018-12-11', 7), 
(123, '2018-12-11', '2018-12-24', 7)

Which shows a date range and "service rank" for each person (there is only one person in this example but there are 10's of thousands in the database)

Where for each person_id and each service_rank I would like to group the date periods to identify how many distinct periods they have had. So in the above example this is what I would be looking for:

PERSON ID, START_DATE,  END_DATE,  SERVICE_RANK, SERVICE_PERIOD    
    123    2018-01-31   2018-02-14     7             1
    123    2018-03-28   2018-04-11     4             2
    123    2018-04-12   2018-04-30     4             2
    123    2018-05-25   2018-06-08     7             3
    123    2018-06-08   2018-06-15     7             3
    123    2018-06-19   2018-06-26     7             3
    123    2018-06-26   2018-09-28     4             4
    123    2018-10-10   2018-11-07     7             5
    123    2018-11-27   2018-12-11     7             5
    123    2018-12-11   2018-12-24     7             5

I have tried row_number, rank, dense_rank and even had a go at the dreaded CURSOR FOR but I cannot get anything work as the windowed functions see the service ranks as the same, so for the above example it would see two service ranks when there are actually 5 they just share the same numbering.

Also in the dataset not every person will jump from one service_rank to another and back. They may go from one to another (eg 4 -> 7) and stay there or they may only have one service_rank over multiple rows.

Any ideas??

Gordon Linoff · Accepted Answer

This is a gaps-and-islands problem. For this purpose, one method is lag() and a cumulative sum:

select t.*,
       sum(case when prev_service_rank = service_rank then 0 else 1 end) over (partition by person_id order by start_date) as service_period
from (select t.*,
             lag(service_rank) over (partition by person_id order by start_date) as prev_service_rank
      from t
     ) t;

Grouping together date periods based on two separate rankings/groupings

Answers (1)

Related Questions