Reputation: 249
I need some help with grouping based on date-ranges, where I do not have any obvious Partition by seperator. I have this dataset (Table New_Test), where every ID has an unknown amount of month+year entries but if they are connectected, they should get the same grouping ID (in a new column). E.g. If Month 9, Year 2016 is followed by Month 10, Year 2016 they should both get the Grouping-ID 1. If there is a gap, which is the case from Month 3, 2018 to Month 5, 2018 a new Grouping ID should be assigned.
To find the consecutive order of month, I draw out Match1 and Match2 (helper-columns) which is a lead and lag function of the previous and next Start- and EndDate.
To assign a Grouping-ID I've tried an IIF-formula, where a match between either StartDate = Match2 or EndDate = Match1 is assigned the value 1 or else 0. I've tried substituting the 1 with various versions of Dense_rank, rank, Row_number. If I use Dense_Rank() OVER (PARTITION BY ID ORDER BY ID), I get the Grouping-ID values 1,0,1 for the three group ranges regarding ID 1 instead of the goal of 1,2,3 as I have no usable seperator in my dataset. This means that when I later on wants to group these data by date-range MIN and MAX these 2 islands will be combined into 1 which is not what I want.
I hope somebody has some great inputs to this! :)
SELECT
ID
,StartDate
,EndDate
,LEAD(DATEADD(day,-1,StartDate),1) OVER (ORDER BY ID, Year, Month) AS Match1
,LAG(DATEADD(day,1,EndDate),1) OVER (ORDER BY ID, Year, Month) AS Match2
,IIF(StartDate= LAG(DATEADD(day,1,EndDate),1) OVER (ORDER BY ID, Year, Month)
OR EndDate =LEAD(DATEADD(day,-1,StartDate),1) OVER (ORDER BY ID, Year, Month)
,1,0) AS Grouping-ID
,Year
,Month
FROM NEW_Test
My data looks like after first edit:
ID StartDate EndDate Match1 Match2 Year Month
1 01-09-2016 30-09-2016 30-09-2016 NULL 2016 9
1 01-10-2016 31-10-2016 31-10-2016 01-10-2016 2016 10
1 01-11-2016 30-11-2016 30-11-2016 01-11-2016 2016 11
1 01-12-2016 31-12-2016 31-12-2016 01-12-2016 2016 12
1 01-01-2017 31-01-2017 31-01-2017 01-01-2017 2017 1
1 01-02-2017 28-02-2017 28-02-2017 01-02-2017 2017 2
1 01-03-2017 31-03-2017 31-03-2017 01-03-2017 2017 3
1 01-04-2017 30-04-2017 30-04-2017 01-04-2017 2017 4
1 01-05-2017 31-05-2017 31-05-2017 01-05-2017 2017 5
1 01-06-2017 30-06-2017 30-06-2017 01-06-2017 2017 6
1 01-07-2017 31-07-2017 31-07-2017 01-07-2017 2017 7
1 01-08-2017 31-08-2017 31-08-2017 01-08-2017 2017 8
1 01-09-2017 30-09-2017 30-09-2017 01-09-2017 2017 9
1 01-10-2017 31-10-2017 31-10-2017 01-10-2017 2017 10
1 01-11-2017 30-11-2017 30-11-2017 01-11-2017 2017 11
1 01-12-2017 31-12-2017 31-12-2017 01-12-2017 2017 12
1 01-01-2018 31-01-2018 31-01-2018 01-01-2018 2018 1
1 01-02-2018 28-02-2018 28-02-2018 01-02-2018 2018 2
1 01-03-2018 31-03-2018 30-04-2018 01-03-2018 2018 3
1 01-05-2018 31-05-2018 31-10-2018 01-04-2018 2018 5
1 01-11-2018 30-11-2018 30-11-2018 01-06-2018 2018 11
1 01-12-2018 31-12-2018 NULL 01-12-2018 2018 12
2 01-09-2016 30-09-2016 30-09-2016 NULL 2016 9
2 01-10-2016 31-10-2016 31-10-2016 01-10-2016 2016 10
2 01-11-2016 30-11-2016 30-11-2016 01-11-2016 2016 11
2 01-01-2017 31-01-2017 31-01-2017 01-01-2017 2017 1
2 01-02-2017 28-02-2017 28-02-2017 01-02-2017 2017 2
2 01-03-2017 31-03-2017 31-03-2017 01-03-2017 2017 3
2 01-04-2017 30-04-2017 30-04-2017 01-04-2017 2017 4
2 01-05-2017 31-05-2017 31-05-2017 01-05-2017 2017 5
What should be the end-result:
ID StartDate EndDate Match1 Match2 Year Month Grouping-ID
1 01-09-2016 30-09-2016 30-09-2016 NULL 2016 9 1
1 01-10-2016 31-10-2016 31-10-2016 01-10-2016 2016 10 1
1 01-11-2016 30-11-2016 30-11-2016 01-11-2016 2016 11 1
1 01-12-2016 31-12-2016 31-12-2016 01-12-2016 2016 12 1
1 01-01-2017 31-01-2017 31-01-2017 01-01-2017 2017 1 1
1 01-02-2017 28-02-2017 28-02-2017 01-02-2017 2017 2 1
1 01-03-2017 31-03-2017 31-03-2017 01-03-2017 2017 3 1
1 01-04-2017 30-04-2017 30-04-2017 01-04-2017 2017 4 1
1 01-05-2017 31-05-2017 31-05-2017 01-05-2017 2017 5 1
1 01-06-2017 30-06-2017 30-06-2017 01-06-2017 2017 6 1
1 01-07-2017 31-07-2017 31-07-2017 01-07-2017 2017 7 1
1 01-08-2017 31-08-2017 31-08-2017 01-08-2017 2017 8 1
1 01-09-2017 30-09-2017 30-09-2017 01-09-2017 2017 9 1
1 01-10-2017 31-10-2017 31-10-2017 01-10-2017 2017 10 1
1 01-11-2017 30-11-2017 30-11-2017 01-11-2017 2017 11 1
1 01-12-2017 31-12-2017 31-12-2017 01-12-2017 2017 12 1
1 01-01-2018 31-01-2018 31-01-2018 01-01-2018 2018 1 1
1 01-02-2018 28-02-2018 28-02-2018 01-02-2018 2018 2 1
1 01-03-2018 31-03-2018 30-04-2018 01-03-2018 2018 3 1
1 01-05-2018 31-05-2018 31-10-2018 01-04-2018 2018 5 2
1 01-11-2018 30-11-2018 30-11-2018 01-06-2018 2018 11 3
1 01-12-2018 31-12-2018 NULL 01-12-2018 2018 12 3
2 01-09-2016 30-09-2016 30-09-2016 NULL 2016 9 4
2 01-10-2016 31-10-2016 31-10-2016 01-10-2016 2016 10 4
2 01-11-2016 30-11-2016 30-11-2016 01-11-2016 2016 11 4
2 01-01-2017 31-01-2017 31-01-2017 01-01-2017 2017 1 5
2 01-02-2017 28-02-2017 28-02-2017 01-02-2017 2017 2 5
2 01-03-2017 31-03-2017 31-03-2017 01-03-2017 2017 3 5
2 01-04-2017 30-04-2017 30-04-2017 01-04-2017 2017 4 5
2 01-05-2017 31-05-2017 31-05-2017 01-05-2017 2017 5 5
Upvotes: 2
Views: 104
Reputation: 249
Thank you very much GBM! You've managed to solve my problem by pointing my in the right direction! I needed to add one more AND clause if I wanted to group the date-ranges by one ID in both directions - meaning:
SELECT
x.*
,1 + rn - SUM(matched) OVER(ORDER BY id, rn) AS GroupingID
FROM (
SELECT
t.*
,ROW_NUMBER() OVER(ORDER BY id, StartDate) rn,
,CASE
WHEN
ID = LEAD(ID) OVER(ORDER BY ID, StartDate)
AND (DATEADD(day, 1, EndDate) = LEAD(StartDate) OVER(ORDER BY ID, StartDate)
AND DATEADD(day, -1, StartDate) = LAG(EndDate) OVER(ORDER BY ID, StartDate))
THEN 1
ELSE 0
END AS Matched
FROM New_test AS t
) x
Upvotes: 0
Reputation: 222402
This is a variation of the gaps-and-island problem.
I would proceed as follows:
ROW_NUMER()
, and set up a flagged that checks if the start date of the next record is contiguous to the end of the current record and if it has the same idConsider:
SELECT
x.*
1 + rn - SUM(matched) OVER(ORDER BY id, rn) AS GroupingID
FROM (
SELECT
t.*
ROW_NUMBER() OVER(ORDER BY id, StartDate) rn,
CASE
WHEN
id = LEAD(id) OVER(ORDER BY id, StartDate)
AND DATEADD(day, 1, EndDate) = LEAD(StartDate) OVER(ORDER BY id, StartDate)
THEN 1
ELSE 0
END matched
FROM mytable
) x
Upvotes: 1