Reputation: 45
Given two timelines, A and B, where each timeline consist of non-overlapping periods and a score is associated with each period. I want to find all the periods and maximum score for the overlap between the timelines, while keeping the periods that does not have an overlap.
I.e. for the periods that overlap, I want the start, end and maximum score of A and B. For the periods that does not overlap, i still want the start and end, and the associated score.
Periods in each timeline can line up with each other. For cases where the timelines overlap, the score should be consided included in the start and end points.
An example is shown in the diagram below where with numbers for reference, and the score shown for each interval. A, B are timelines, and R is the resulting timeline i want to calculate.
0 1 2 3 4 5
12345678901234567890123456789012345678901234567890
A = |---------| |--| |--------| |--|
B = |--------|--------| |--------| |--| |--|
R = |---|----|----|---| |--|--|--| |--|--|--| |--||--|
Scores per interval:
A = |---10----| |20| |---30---| |40|
B = |---6----|---12---| |---17---| |33| |50|
R = |6--|10--|12--|12-| |17|20|17| |30|33|30| |40||50|
DROP TABLE IF EXISTS #TABLE_A
CREATE TABLE #TABLE_A (
timeline varchar(1),
time_start int,
time_end int,
score int
)
DROP TABLE IF EXISTS #TABLE_B
CREATE TABLE #TABLE_B (
timeline varchar(1),
time_start int,
time_end int,
score int
)
DROP TABLE IF EXISTS #TABLE_R
CREATE TABLE #TABLE_R (
time_start int,
time_end int,
score int
)
INSERT INTO #TABLE_A
VALUES
('A', 5, 15, 10)
,('A', 24, 27, 20)
,('A', 32, 41, 30)
,('A', 43, 46, 40)
INSERT INTO #TABLE_B
VALUES
('B', 1, 10, 6)
,('B', 10, 19, 12)
,('B', 21, 30, 17)
,('B', 35, 38, 33)
,('B', 47, 50, 50)
INSERT INTO #TABLE_R
VALUES
(1, 5, 6)
,(5, 10, 10)
,(10, 15, 12)
,(15, 19, 12)
,(21, 24, 17)
,(24, 27, 20)
,(27, 30, 17)
,(32, 35, 30)
,(35, 38, 33)
,(38, 41, 30)
,(43, 46, 40)
,(47, 50, 50)
I have tried to find all the cases where the periods overlap to determine the start and end of the period as well as the score. For scenarios where there is no overlap the period is difficult to select. I have taken the union for all the scenarios, which is quite a lot. I am therefore looking for an elegant solution to split the timelines into periods.
I found the following solution, which does not require creating the entire value range. For this simple example it does not really matter, but for the real world problem i was trying to solve where the times are timestamps it makes a difference. And for large datasets the recursion limit is quicly reached.
The solution creates all the points including intersections, gives the row a number and matches the point by the next in the row. For each pair, the score is then checked to see which is greatest.
WITH points AS (
SELECT time_start AS 'point' FROM #TABLE_A
UNION
SELECT time_end FROM #TABLE_A
UNION
SELECT time_start FROM #TABLE_B
UNION
SELECT time_end FROM #TABLE_B
),
points_row as (
SELECT
point
,ROW_NUMBER() OVER(ORDER BY point) AS row_num
FROM points
),
time_periods AS (
SELECT
s.point AS 'time_start'
,e.point AS 'time_end'
FROM points_row s
LEFT JOIN points_row e on s.row_num + 1 = e.row_num
)
SELECT * FROM (
SELECT
tp.time_start
,tp.time_end
,CASE
WHEN A.score IS NULL AND B.score IS NULL THEN NULL
WHEN A.score IS NULL THEN B.score
WHEN B.score IS NULL THEN A.score
WHEN A.score > B.score THEN A.score else B.score
END AS 'score'
FROM time_periods tp
LEFT JOIN #TABLE_A A ON A.time_start <= tp.time_start AND A.time_end >= tp.time_end
LEFT JOIN #TABLE_B B ON B.time_start <= tp.time_start AND B.time_end >= tp.time_end
) a
WHERE score IS NOT NULL
Upvotes: 2
Views: 177
Reputation: 71471
The query below first produces the total value range with a recursive cte
, then generates the intervals with maximum score:
with cte(s, e) as (
select (select min(least(a.time_start, b.time_start)) from #TABLE_A a
cross join #TABLE_B b),
(select max(greatest(a.time_end, b.time_end)) from #TABLE_A a
cross join #TABLE_B b)
union all
select c.s+1, c.e from cte c where c.s < e
),
intervals(s, v) as (
select c.s, max(greatest(coalesce(a.score, 0), coalesce(b.score, 0))) v
from cte c left join #TABLE_A a on a.time_start <= c.s and c.s <= a.time_end
left join #TABLE_B b on b.time_start <= c.s and c.s <= b.time_end
group by c.s
),
blocks(r, s, e, v, o, i, k) as (
select t1.r, min(t1.s), max(t1.s), max(t1.v), min(t1.s), 0, 1 from (
select i.*, (select count(*) from intervals i1 where i1.s < i.s and i1.v != i.v) r
from intervals i) t1
group by t1.r
union all
select b.r, b.s, b.e, b.v, b.o + 1, case when b.k%5 = 0 then b.i + 1 else b.i end, b.k + 1
from blocks b where b.o < b.e
)
select t2.s, t2.e, t2.v from (select b.r, b.i, min(b.o) s, max(b.o) e, max(b.v) v from blocks b
group by b.r, b.i) t2
where t2.v != 0 order by t2.r
Upvotes: 0