Reputation: 23
Hi all sorry about my poorly worded title I am unsure as to how to phrase exactly what I need. But i will try and explain it better below:
I have a dataset that looks like this:
DECLARE @TestDATA TABLE (PERSON_ID int, START_DATE date, END_DATE date,SERVICE_RANK int)
INSERT INTO @TestDATA
VALUES
(123, '2018-01-31', '2018-02-14', 7),
(123, '2018-03-28', '2018-04-11', 4),
(123, '2018-04-12', '2018-04-30', 4),
(123, '2018-05-25', '2018-06-08', 7),
(123, '2018-06-08', '2018-06-15', 7),
(123, '2018-06-19', '2018-06-26', 7),
(123, '2018-06-26', '2018-09-28', 4),
(123, '2018-10-10', '2018-11-07', 7),
(123, '2018-11-27', '2018-12-11', 7),
(123, '2018-12-11', '2018-12-24', 7)
Which shows a date range and "service rank" for each person (there is only one person in this example but there are 10's of thousands in the database)
Where for each person_id and each service_rank I would like to group the date periods to identify how many distinct periods they have had. So in the above example this is what I would be looking for:
PERSON ID, START_DATE, END_DATE, SERVICE_RANK, SERVICE_PERIOD
123 2018-01-31 2018-02-14 7 1
123 2018-03-28 2018-04-11 4 2
123 2018-04-12 2018-04-30 4 2
123 2018-05-25 2018-06-08 7 3
123 2018-06-08 2018-06-15 7 3
123 2018-06-19 2018-06-26 7 3
123 2018-06-26 2018-09-28 4 4
123 2018-10-10 2018-11-07 7 5
123 2018-11-27 2018-12-11 7 5
123 2018-12-11 2018-12-24 7 5
I have tried row_number, rank, dense_rank and even had a go at the dreaded CURSOR FOR but I cannot get anything work as the windowed functions see the service ranks as the same, so for the above example it would see two service ranks when there are actually 5 they just share the same numbering.
Also in the dataset not every person will jump from one service_rank to another and back. They may go from one to another (eg 4 -> 7) and stay there or they may only have one service_rank over multiple rows.
Any ideas??
Upvotes: 2
Views: 39
Reputation: 1270401
This is a gaps-and-islands problem. For this purpose, one method is lag()
and a cumulative sum:
select t.*,
sum(case when prev_service_rank = service_rank then 0 else 1 end) over (partition by person_id order by start_date) as service_period
from (select t.*,
lag(service_rank) over (partition by person_id order by start_date) as prev_service_rank
from t
) t;
Upvotes: 2