Reputation: 83
I have an issue where I need to compare the dates of several rows. The requirement is the data needs to be grouped by 'Region/Area' combination with lowest 'StartDate' and highest 'EndDate' UNLESS there is a gap of more than 1 day between the previous 'EndDate' and the next 'StartDate'.
The 'StartDate' will always be the first of the month and the 'EndDate' will always the last day of the month.
Given a simplified table as such:
Region | Area | StartDate | EndDate
-------|------|---------------|-------------
A | 1 | 01/01/2016 | 03/31/2016
A | 1 | 04/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 09/30/2016
A | 1 | 10/01/2016 | 01/31/2017
A | 1 | 02/01/2017 | 12/31/2017
B | 2 | 01/01/2016 | 04/30/2016
B | 2 | 05/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 05/31/2016
A | 4 | 06/01/2016 | 12/31/2016
I need the results to look something like this:
Region | Area | StartDate | EndDate
-------|------|--------------|-----------
A | 1 | 01/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 12/31/2017
B | 2 | 01/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 12/31/2016
I have tried GROUP BY with MIN and MAX dates but I cannot seem to get the logic of it correct.
Any thoughts or suggestions would be greatly appreciated.
Upvotes: 2
Views: 107
Reputation: 1739
This seems like a data island problem. You can make use of the windowing functions introduced in SQL Server 2012. Using the LAG
windowing function, you can determine if your last records end date time has a gap greater than a day with the current records start datetime. Next you can use the SUM OVER
clause to generate a grouping id for each of your data islands.
DECLARE @SourceData TABLE
(
Region NVARCHAR(10)
,Area INT
,StartDate DATETIME
,EndDate DATETIME
);
INSERT INTO @SourceData
VALUES
('A', 1, '01/01/2016', '03/31/2016'),
('A', 1, '04/01/2016', '05/31/2016'),
('A', 1, '07/01/2016', '09/30/2016'),
('A', 1, '10/01/2016', '01/31/2017'),
('A', 1, '02/01/2017', '12/31/2017'),
('B', 2, '01/01/2016', '04/30/2016'),
('B', 2, '05/01/2016', '09/30/2016'),
('A', 4, '01/01/2016', '05/31/2016'),
('A', 4, '06/01/2016', '12/31/2016');
;WITH CTE_DataIslands -- First CTE determine the start of each new data island
AS
(
SELECT Region
,Area
,StartDate
,EndDate
,(
CASE
WHEN DATEADD(DAY, 1, LAG(EndDate, 1) OVER (PARTITION BY Region, Area ORDER BY StartDate ASC)) < (StartDate) THEN 1 -- If prev record's end date + 1 day is not equal to current record's start date then it is the start of a new data island.
ELSE 0
END
) AS [IsNewDataIsland]
FROM @SourceData
)
, CTE_GenerateGroupingID
AS
(
SELECT Region
,Area
,StartDate
,EndDate
,SUM([IsNewDataIsland]) OVER (PARTITION BY Region, Area ORDER BY StartDate ASC ROWS UNBOUNDED PRECEDING) AS GroupingID -- Create a running total of the IsNewDataIsland column this will create a grouping id we can now group on
FROM CTE_DataIslands
)
SELECT Region
,Area
,MIN(StartDate) AS StartDate
,MAX(EndDate) AS StartDate
FROM CTE_GenerateGroupingID
GROUP BY Region, Area, GroupingID
Upvotes: 2