Lee White
Lee White

Reputation: 3719

Split ranges of dates using TSQL

I have two tables: one containing ranges of dates (i.e. a ValidFromDate and ValidToDate column) and one containing events (i.e. an EventDate column).

I am looking to write a script that will split each range of dates into multiple records, if an event occurs in between them.

For instance, consider the following.

Ranges:

ValidFrom   | ValidTo    | RangeId
2010-01-01  | 2010-05-01 | 1
2010-05-01  | 2010-05-20 | 2
2010-05-20  | 2017-02-23 | 3
2017-02-23  | 2020-12-31 | 4

Events:

EventDate   | EventId
2010-03-15  | 101
2011-04-15  | 102
2015-05-15  | 103

In this scenario, one event (101) occurs during the span of range 1, so the record should be split into the following. This can either be done by updating the original record and inserting another one, or by deleting the original record and inserting two new ones.

ValidFrom   | ValidTo    | RangeId
2010-01-01  | 2010-03-15 | 1
2010-03-15  | 2010-05-01 | 5

All solutions I have drafted have always had the same problem: if two events occur during 1 range of dates, the script will not process those properly. I could solve this by writing it based on a for-loop that processes each event only after the previous event has finished processing, but this would lead to very poor performance.

Can anyone give me some pointers on how to do this? Perhaps it's a common problem that has a commonly accepted solution?

Upvotes: 3

Views: 222

Answers (2)

AakashM
AakashM

Reputation: 63358

Assumptions: the initial data in Range is contiguous; the EventDates all fall within the range of dates defined by Range.

The key to the approach below is to recognise that all three kinds of date in the input data - ValidFrom, ValidTo, and EventDate - have the same result in the required output. The required output is a list of ranges, but the boundary dates of those ranges come equally and without differentiation from source-Range data and source-Event data.

So, assuming this table structure and sample data:

DECLARE @Range TABLE (
    ValidFrom datetime,
    ValidTo datetime,
    RangeId int 
);

DECLARE @Event TABLE (
    EventDate datetime,
    EventId int
);

INSERT @Range VALUES 
( '20100101', '20100501', 1 ),
( '20100501', '20100520', 2 ),
( '20100520', '20170223', 3 ),
( '20170223', '20201231', 4 )
;

INSERT @Event VALUES
( '20100315', 101 ),
( '20110415', 102 ),
( '20150515', 103 )
;

we combine the source dates:

WITH RelevantDate ( D ) AS (
SELECT ValidFrom FROM @Range
UNION 
SELECT ValidTo FROM @Range
UNION 
SELECT EventDate FROM @Event
)

number them:

, SequencedDate ( D, Sequence ) AS (
SELECT D, ROW_NUMBER() OVER ( ORDER BY D )
FROM RelevantDate
)

and produce the output:

SELECT 
    D1.D AS ValidFrom, 
    D2.D AS ValidTo, 
    D1.Sequence AS RangeId
FROM
    SequencedDate D1
    INNER JOIN SequencedDate D2 ON D1.Sequence + 1 = D2.Sequence
;

This produces a list of Ranges which is intended to replace the existing Range data:

ValidFrom               ValidTo                 RangeId
----------------------- ----------------------- --------------------
2010-01-01 00:00:00.000 2010-03-15 00:00:00.000 1
2010-03-15 00:00:00.000 2010-05-01 00:00:00.000 2
2010-05-01 00:00:00.000 2010-05-20 00:00:00.000 3
2010-05-20 00:00:00.000 2011-04-15 00:00:00.000 4
2011-04-15 00:00:00.000 2015-05-15 00:00:00.000 5
2015-05-15 00:00:00.000 2017-02-23 00:00:00.000 6
2017-02-23 00:00:00.000 2020-12-31 00:00:00.000 7

Upvotes: 1

Abdul Rasheed
Abdul Rasheed

Reputation: 6729

try the below script. The RangeId will keep as same for existing records here.

WITH Result
as  (
    select  ValidFrom,ValidTo,RangeId
    from    [Ranges]
    union
    select  null,EventDate,ROW_NUMBER() OVER(ORDER BY EventDate)    +
            (SELECT MAX(RangeId) FROM Ranges)
    from    [Events]
)  --CTE to get all ValidTo dates as list
select  isnull(lag(ValidTo) over (order by ValidTo),
           (select MIN(ValidFrom) from Ranges)) as ValidFrom,
        ValidTo,RangeId
from    Result
order by ValidTo

Upvotes: 1

Related Questions