Reputation: 3719
I have two tables: one containing ranges of dates (i.e. a ValidFromDate
and ValidToDate
column) and one containing events (i.e. an EventDate
column).
I am looking to write a script that will split each range of dates into multiple records, if an event occurs in between them.
For instance, consider the following.
Ranges:
ValidFrom | ValidTo | RangeId
2010-01-01 | 2010-05-01 | 1
2010-05-01 | 2010-05-20 | 2
2010-05-20 | 2017-02-23 | 3
2017-02-23 | 2020-12-31 | 4
Events:
EventDate | EventId
2010-03-15 | 101
2011-04-15 | 102
2015-05-15 | 103
In this scenario, one event (101
) occurs during the span of range 1
, so the record should be split into the following. This can either be done by updating the original record and inserting another one, or by deleting the original record and inserting two new ones.
ValidFrom | ValidTo | RangeId
2010-01-01 | 2010-03-15 | 1
2010-03-15 | 2010-05-01 | 5
All solutions I have drafted have always had the same problem: if two events occur during 1 range of dates, the script will not process those properly. I could solve this by writing it based on a for-loop that processes each event only after the previous event has finished processing, but this would lead to very poor performance.
Can anyone give me some pointers on how to do this? Perhaps it's a common problem that has a commonly accepted solution?
Upvotes: 3
Views: 222
Reputation: 63358
Assumptions: the initial data in Range
is contiguous; the EventDate
s all fall within the range of dates defined by Range
.
The key to the approach below is to recognise that all three kinds of date in the input data - ValidFrom
, ValidTo
, and EventDate
- have the same result in the required output. The required output is a list of ranges, but the boundary dates of those ranges come equally and without differentiation from source-Range data and source-Event data.
So, assuming this table structure and sample data:
DECLARE @Range TABLE (
ValidFrom datetime,
ValidTo datetime,
RangeId int
);
DECLARE @Event TABLE (
EventDate datetime,
EventId int
);
INSERT @Range VALUES
( '20100101', '20100501', 1 ),
( '20100501', '20100520', 2 ),
( '20100520', '20170223', 3 ),
( '20170223', '20201231', 4 )
;
INSERT @Event VALUES
( '20100315', 101 ),
( '20110415', 102 ),
( '20150515', 103 )
;
we combine the source dates:
WITH RelevantDate ( D ) AS (
SELECT ValidFrom FROM @Range
UNION
SELECT ValidTo FROM @Range
UNION
SELECT EventDate FROM @Event
)
number them:
, SequencedDate ( D, Sequence ) AS (
SELECT D, ROW_NUMBER() OVER ( ORDER BY D )
FROM RelevantDate
)
and produce the output:
SELECT
D1.D AS ValidFrom,
D2.D AS ValidTo,
D1.Sequence AS RangeId
FROM
SequencedDate D1
INNER JOIN SequencedDate D2 ON D1.Sequence + 1 = D2.Sequence
;
This produces a list of Range
s which is intended to replace the existing Range
data:
ValidFrom ValidTo RangeId
----------------------- ----------------------- --------------------
2010-01-01 00:00:00.000 2010-03-15 00:00:00.000 1
2010-03-15 00:00:00.000 2010-05-01 00:00:00.000 2
2010-05-01 00:00:00.000 2010-05-20 00:00:00.000 3
2010-05-20 00:00:00.000 2011-04-15 00:00:00.000 4
2011-04-15 00:00:00.000 2015-05-15 00:00:00.000 5
2015-05-15 00:00:00.000 2017-02-23 00:00:00.000 6
2017-02-23 00:00:00.000 2020-12-31 00:00:00.000 7
Upvotes: 1
Reputation: 6729
try the below script. The RangeId
will keep as same for existing records here.
WITH Result
as (
select ValidFrom,ValidTo,RangeId
from [Ranges]
union
select null,EventDate,ROW_NUMBER() OVER(ORDER BY EventDate) +
(SELECT MAX(RangeId) FROM Ranges)
from [Events]
) --CTE to get all ValidTo dates as list
select isnull(lag(ValidTo) over (order by ValidTo),
(select MIN(ValidFrom) from Ranges)) as ValidFrom,
ValidTo,RangeId
from Result
order by ValidTo
Upvotes: 1