Harry
Harry

Reputation: 184

Merging groups of interval data - SQL Server

I have two sets of interval data I.E.

Start End Type1 Type2
0     2   L     NULL
2     5   L     NULL
5     7   L     NULL
7     10  L     NULL
2     3   NULL  S
3     5   NULL  S
5     8   NULL  S
11    12  NULL  S

What I'd like to do is merge these sets into one. This seems possible by utilising an islands and gaps solution but due to the non-continuous nature of the intervals I'm not sure how to go about applying it... The output I'm expecting would be:

Start End Type1 Type2
0     2   L     NULL
2     3   L     S
3     5   L     S
5     7   L     S
7     8   L     S
8     10  L     NULL
11    12  NULL  S

Anyone out there done something like this before??? Thanks!

Create script below:

CREATE TABLE Table1
    ([Start] int, [End] int, [Type1] varchar(4), [Type2] varchar(4))
;

INSERT INTO Table1
    ([Start], [End], [Type1], [Type2])
VALUES
    (0, 2, 'L', NULL),
    (2, 3, NULL, 'S'),
    (2, 5, 'L', NULL),
    (3, 5, NULL, 'S'),
    (5, 7, 'L', NULL),
    (5, 8, NULL, 'S'),
    (7, 10, 'L', NULL),
    (11, 12, NULL, 'S')
;

Upvotes: 3

Views: 177

Answers (3)

Vladimir Baranov
Vladimir Baranov

Reputation: 32703

I assume that Start is inclusive, End is exclusive and given intervals do not overlap.

CTE_Number is a table of numbers. Here it is generated on the fly. I have it as a permanent table in my database.

CTE_T1 and CTE_T2 expand each interval into the corresponding number of rows using a table of numbers. For example, interval [2,5) generates rows with Values

2
3
4

This is done twice: for Type1 and Type2.

Results for Type1 and Type2 are FULL JOINed together on Value.

Finally, a gaps-and-islands pass groups/collapses intervals back.

Run the query step-by-step, CTE-by-CTE and examine intermediate results to understand how it works.

Sample data

I added few rows to illustrate a case when there is a gap between values.

DECLARE @Table1 TABLE
    ([Start] int, [End] int, [Type1] varchar(4), [Type2] varchar(4))
;

INSERT INTO @Table1 ([Start], [End], [Type1], [Type2]) VALUES
( 0,  2, 'L', NULL),
( 2,  3, NULL, 'S'),
( 2,  5, 'L', NULL),
( 3,  5, NULL, 'S'),
( 5,  7, 'L', NULL),
( 5,  8, NULL, 'S'),
( 7, 10, 'L', NULL),
(11, 12, NULL, 'S'),

(15, 20, 'L', NULL),
(15, 20, NULL, 'S');

Query

WITH 
e1(n) AS
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 10
,e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b) -- 10*10
,e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
,CTE_Numbers
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY n) AS Number
    FROM e3
)
,CTE_T1
AS
(
    SELECT
        T1.[Start] + CA.Number - 1 AS Value
        ,T1.Type1
    FROM
        @Table1 AS T1
        CROSS APPLY
        (
            SELECT TOP(T1.[End] - T1.[Start]) CTE_Numbers.Number
            FROM CTE_Numbers
            ORDER BY CTE_Numbers.Number
        ) AS CA
    WHERE
        T1.Type1 IS NOT NULL
)
,CTE_T2
AS
(
    SELECT
        T2.[Start] + CA.Number - 1 AS Value
        ,T2.Type2
    FROM
        @Table1 AS T2
        CROSS APPLY
        (
            SELECT TOP(T2.[End] - T2.[Start]) CTE_Numbers.Number
            FROM CTE_Numbers
            ORDER BY CTE_Numbers.Number
        ) AS CA
    WHERE
        T2.Type2 IS NOT NULL
)
,CTE_Values
AS
(
    SELECT
        ISNULL(CTE_T1.Value, CTE_T2.Value) AS Value
        ,CTE_T1.Type1
        ,CTE_T2.Type2
        ,ROW_NUMBER() OVER (ORDER BY ISNULL(CTE_T1.Value, CTE_T2.Value)) AS rn
    FROM
        CTE_T1
        FULL JOIN CTE_T2 ON CTE_T2.Value = CTE_T1.Value
)
,CTE_Groups
AS
(
    SELECT
        Value
        ,Type1
        ,Type2
        ,rn
        ,ROW_NUMBER() OVER 
            (PARTITION BY rn - Value, Type1, Type2 ORDER BY Value) AS rn2
    FROM CTE_Values
)
SELECT
    MIN(Value) AS [Start]
    ,MAX(Value) + 1 AS [End]
    ,Type1
    ,Type2
FROM CTE_Groups
GROUP BY rn-rn2, Type1, Type2
ORDER BY [Start];

Result

+-------+-----+-------+-------+
| Start | End | Type1 | Type2 |
+-------+-----+-------+-------+
|     0 |   2 | L     | NULL  |
|     2 |   8 | L     | S     |
|     8 |  10 | L     | NULL  |
|    11 |  12 | NULL  | S     |
|    15 |  20 | L     | S     |
+-------+-----+-------+-------+

Upvotes: 1

LukStorms
LukStorms

Reputation: 29677

First getting all the numbers of start and end via a Union.
Then joining those numbers on both the 'L' and 'S' records.

Uses a table variable for the test.

DECLARE @Table1 TABLE (Start int, [End] int, Type1 varchar(4), Type2 varchar(4));

INSERT INTO @Table1 (Start, [End], Type1, Type2) 
VALUES (0, 2, 'L', NULL),(2, 3, NULL, 'S'),(2, 5, 'L', NULL),(3, 5, NULL, 'S'),
(5, 7, 'L', NULL),(5, 8, NULL, 'S'),(7, 10, 'L', NULL),(11, 12, NULL, 'S');

select 
n.Num as Start,
(case when s.[End] is null or l.[End] <= s.[End] then l.[End] else s.[End] end) as [End],
l.Type1, 
s.Type2
from
(select Start as Num from @Table1 union select [End] from @Table1) n
left join @Table1 l on (n.Num >= l.Start and n.Num < l.[End] and l.Type1 = 'L')
left join @Table1 s on (n.Num >= s.Start and n.Num < s.[End] and s.Type2 = 'S')
where (l.Start is not null or s.Start is not null)
order by Start, [End];

Output:

Start End Type1 Type2
0     2   L     NULL
2     3   L     S
3     5   L     S
5     7   L     S
7     8   L     S
8     10  L     NULL
11    12  NULL  S

Upvotes: 0

shA.t
shA.t

Reputation: 16978

A step-by-step way is:

-- Finding all break points
;WITH breaks AS (
    SELECT Start
    FROM yourTable
    UNION 
    SELECT [End]
    FROM yourTable
) -- Finding Possible Ends
, ends AS (
    SELECT Start
        , (SELECT Min([End]) FROM yourTable WHERE yourTable.Start = breaks.Start) End1
        , (SELECT Max([End]) FROM yourTable WHERE yourTable.Start < breaks.Start) End2
    FROM breaks
) -- Finding periods
, periods AS (
    SELECT Start, 
        CASE 
            WHEN End1 > End2 And End2 > Start THEN End2
            WHEN End1 IS NULL THEN End2
            ELSE End1
        END [End]
    FROM Ends
    WHERE NOT(End1 IS NULL AND Start = End2)
) -- Generating results
SELECT p.Start, p.[End], Max(Type1) Type1, Max(Type2) Type2
FROM periods p, yourTable t
WHERE p.start >= t.Start AND p.[End] <= t.[End]
GROUP BY p.Start, p.[End];

In above query some situations may not fit at analyzing all of them, you can improve it as you want ;).

Upvotes: 0

Related Questions