Beryl
Beryl

Reputation: 181

SQL Server split overlapping date ranges

I need to split date ranges that overlap. I have a primary table (I've called it Employment for this example), and I need to return all Begin-End date ranges for a person from this table. I also have multiple sub tables (represented by Car and Food), and I want to return the value that was active in the sub tables during the times given in the main tables. This will involve splitting the main table date ranges when a sub table item changes.

I don't want to return sub table information for dates not in the main tables.

DECLARE @Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )

DECLARE @Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )

DECLARE @Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )


INSERT INTO @Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES  ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
,       ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
,       ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )

INSERT INTO @Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES  ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
,       ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )

INSERT INTO @Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES  ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
,       ( 123, 'Donuts', '2001-02-23', '2001-02-25' )

For the above data, the results should be:

Person_ID    Employment      Food        Car        Begin_Date    End_Date
123          ACME                                   1986-01-01    1986-04-30
123          ACME                        Red Car    1986-05-01    1990-12-31
123          Office Corp                 Red Car    1995-05-15    1996-12-31
123          Office Corp     Eggs        Red Car    1997-01-01    1997-03-09
123          Office Corp                 Red Car    1997-03-10    1997-06-23
123          Office Corp                            1997-06-24    1997-07-02
123          Office Corp                 Blue Car   1997-07-03    1998-10-03
123          Job 3                       Blue Car   1998-10-04    2001-02-22
123          Job 3           Donuts      Blue Car   2001-02-23    2001-02-25
123          Job 3                       Blue Car   2001-02-26    2999-12-31

The first row is his time working for ACME, where he didn't have a car or a weird food obsession. In the second row, he purchased a car, and still worked at ACME. In the third row, he changed jobs to Office Corp, but still has the Red Car. Note how we're not returning data during his unemployment gap, even though he had the Red Car. We only want to know what was in the Car and Food tables during the times there are values in the Employment table.

I found a solution for SQL Server 2012 that uses the LEAD/LAG functions to accomplish this, but I'm stuck with 2008 R2.

Upvotes: 1

Views: 2136

Answers (2)

Beryl
Beryl

Reputation: 181

Here is the answer I came up with. It works, but it's not very pretty.

It goes it two waves, first splitting any overlapping Employment/Car dates, then running the same SQL a second time add the Food dates and split any overlaps again.

DECLARE @Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )

DECLARE @Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )

DECLARE @Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )


INSERT INTO @Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES  ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
,       ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
,       ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )

INSERT INTO @Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES  ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
,       ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )

INSERT INTO @Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES  ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
,       ( 123, 'Donuts', '2001-02-23', '2001-02-25' )


DECLARE @Person_ID INT = 123;


--A table to hold date ranges that need to be merged together
DECLARE @DatesToMerge TABLE 
(
   ID INT,
   Person_ID INT,
   Date_Type VARCHAR(10),
   Begin_Date DATETIME,
   End_Date DATETIME
)

INSERT INTO @DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Car])
     , Person_ID
     , 'Car'
     , Begin_Date
     , End_Date
FROM @Car
WHERE Person_ID = @Person_ID

INSERT INTO @DatesToMerge 
SELECT ROW_NUMBER() OVER(ORDER BY [Employment])
     , Person_ID
     , 'Employment'
     , Begin_Date
     , End_Date
FROM @Employment
WHERE Person_ID = @Person_ID;


--A table to hold the merged @Employment and Car records
DECLARE @EmploymentAndCar TABLE
(
    RowNumber INT,
    Person_ID INT,
    Begin_Date DATETIME,
    End_Date DATETIME
)

;
WITH CarCTE AS 
(--This CTE grabs just the Car rows so we can compare and split dates from them
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM @DatesToMerge
    WHERE Date_Type = 'Car'
),

NewRowsCTE AS 
( --This CTE creates just new rows starting after the Car dates for each @Employment date range
    SELECT  a.ID,
            a.Person_ID,
            a.Date_Type,
            DATEADD(DAY,1,b.End_Date) AS Begin_Date,
            a.End_Date
    FROM @DatesToMerge a
    INNER JOIN CarCTE b
    ON a.Begin_Date <= b.Begin_Date
    AND a.End_Date > b.Begin_Date
    AND a.End_Date > b.End_Date -- This is needed because if both the Car and @Employment end on the same date, there is split row after 
),

UnionCTE AS 
( -- This CTE merges the new rows with the existing ones
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM @DatesToMerge
    UNION ALL
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM NewRowsCTE
),

FixEndDateCTE AS 
(
    SELECT  CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID, 
            MIN(d.Begin_Date) AS Begin_Date
    FROM UnionCTE c
    LEFT OUTER JOIN CarCTE d
    ON c.Begin_Date < d.Begin_Date
    AND c.End_Date >= d.Begin_Date
    WHERE c.Date_Type <> 'Car'
    GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),

Finalize AS
(
    SELECT  ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
            e.Person_ID,
            e.Begin_Date,
            CASE    WHEN f.Begin_Date IS NULL THEN e.End_Date
                    ELSE DATEADD (DAY,-1,f.Begin_Date)
                    END AS EndDate
    FROM UnionCTE e
    LEFT OUTER JOIN FixEndDateCTE f
    ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID 
)

INSERT INTO @EmploymentAndCar ( RowNumber, Person_ID, Begin_Date, End_Date )

SELECT F.RowNumber
     , F.Person_ID
     , F.Begin_Date
     , F.EndDate
FROM Finalize F
INNER JOIN @Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = @Person_ID
ORDER BY F.Begin_Date


--------------------------------------------------------------------------------------------------
--Now that the Employment and Car dates have been merged, empty the DatesToMerge table
DELETE FROM @DatesToMerge;

--Reload the DatesToMerge table with the newly-merged Employment and Car records, 
--and the Food records that still need to be merged
INSERT INTO @DatesToMerge
SELECT RowNumber
     , Person_ID
     , 'PtBCar'
     , Begin_Date
     , End_Date
FROM @EmploymentAndCar
WHERE Person_ID = @Person_ID

INSERT INTO @DatesToMerge 
SELECT ROW_NUMBER() OVER(ORDER BY [Food]) 
     , Person_ID
     , 'Food'
     , Begin_Date
     , End_Date
FROM @Food
WHERE Person_ID = @Person_ID


;
WITH CarCTE AS 
(--This CTE grabs just the Food rows so we can compare and split dates from them
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM @DatesToMerge
    WHERE Date_Type = 'Food'
),

NewRowsCTE AS 
( --This CTE creates just new rows starting after the Food dates for each Employment date range
    SELECT  a.ID,
            a.Person_ID,
            a.Date_Type,
            DATEADD(DAY,1,b.End_Date) AS Begin_Date,
            a.End_Date
    FROM @DatesToMerge a
    INNER JOIN CarCTE b
    ON a.Begin_Date <= b.Begin_Date
    AND a.End_Date > b.Begin_Date
    AND a.End_Date > b.End_Date -- This is needed because if both the Food and Car/Employment end on the same date, there is split row after 
),

UnionCTE AS 
( -- This CTE merges the new rows with the existing ones
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM @DatesToMerge
    UNION ALL
    SELECT  ID,
            Person_ID,
            Date_Type,
            Begin_Date,
            End_Date
    FROM NewRowsCTE
),

FixEndDateCTE AS 
(
    SELECT  CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID, 
            MIN(d.Begin_Date) AS Begin_Date
    FROM UnionCTE c
    LEFT OUTER JOIN CarCTE d
    ON c.Begin_Date < d.Begin_Date
    AND c.End_Date >= d.Begin_Date
    WHERE c.Date_Type <> 'Food'
    GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),

Finalize AS
(
    SELECT  ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
            e.Person_ID,
            e.Begin_Date,
            CASE    WHEN f.Begin_Date IS NULL THEN e.End_Date
                    ELSE DATEADD (DAY,-1,f.Begin_Date)
                    END AS EndDate
    FROM UnionCTE e
    LEFT OUTER JOIN FixEndDateCTE f
    ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID 
)

SELECT DISTINCT
       F.Person_ID
     , Employment
     , Car
     , Food
     , F.Begin_Date
     , F.EndDate
FROM Finalize F
INNER JOIN @Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = @Person_ID

LEFT JOIN @Car Car
ON Car.[Begin_Date] <= F.Begin_Date 
AND Car.[End_Date] >= F.[EndDate]
AND Car.Person_ID = @Person_ID

LEFT JOIN @Food Food
ON Food.[Begin_Date] <= F.[Begin_Date] 
AND Food.[End_Date] >= F.[EndDate]
AND Food.Person_ID = @Person_ID

ORDER BY F.Begin_Date

If anyone has a more elegant solution, I will be happy to accept their answer.

Upvotes: 0

DancingFool
DancingFool

Reputation: 1267

To change the 2012 solution from that blog to work with 2008, you need to replace the LEAD in the following

with
ValidDates as …
,
ValidDateRanges1 as
(
select EmployeeNo, Date as ValidFrom, lead(Date,1) over (partition by EmployeeNo order by Date) ValidTo
from ValidDates
)

There are a number of ways to do this, but one example is a self join to the same table + 1 row (which is effectively what a lead does). One way to do this is to put a rownumber on the previous table (so it is easy to find the next row) by adding another intermediate CTE (eg ValidDatesWithRowno). Then do a left outer join to that table where EmployeeNo is the same and rowno = rowno + 1, and use that value to replace the lead. If you wanted a lead 2, you would join to rowno + 2, etc. So the 2008 version would look something like

with
ValidDates as …
,
ValidDatesWithRowno as  --This is the ValidDates + a RowNo for easy self joining below
(
select EmployeeNo, Date, ROW_NUMBER() OVER (ORDER BY EmployeeNo, Date) as RowNo from ValidDates
)
,
ValidDateRanges1 as
(
select VD.EmployeeNo, VD.Date as ValidFrom, VDLead1.Date as ValidTo
from ValidDatesWithRowno VD
    left outer join ValidDatesWithRowno VDLead1 on VDLead1.EmployeeNo = VD.EmployeeNo 
        and VDLead1.RowNo = VD.RowNo + 1
)

The rest of the solution described looks like it will work like you want on 2008.

Upvotes: 2

Related Questions