AR.
AR.

Reputation: 40327

How to merge contiguous intervals in SQL Server

I'm trying to find an efficient way to merge/combine contiguous intervals defined by a category. For example, from this source table:

TYPE   START    END    CATEGORY
----   -----    ----   --------
X         0        5      A
X         5       10      B
X        10       15      B
X        15       20      C
X        20       25      C
Y         0        6      A
Y         6       12      A
Y        12       20      B
Y        20       22      B
Y        24       27      B
Y        27       30      B
Y        35       40      C

Query output should be:

TYPE   START    END    CATEGORY
----   -----    ----   --------
X         0        5      A
X         5       15      B
X        15       25      C
Y         0       12      A
Y        12       22      B
Y        24       30      B
Y        35       40      C

Note that for type Y there's a gap between 22 and 24, so a new combined interval is required.

Using SQL Server 2008R2. Data volume is in the thousands of records, so not too huge.

It feels like this is a job for recursive CTEs, but I'm having trouble wrapping my head around it. Any help would be appreciated!

Upvotes: 2

Views: 266

Answers (1)

Ben Thul
Ben Thul

Reputation: 32737

This is commonly referred to as a "gaps and islands" problem. Here's my stab at it:

with cte as (
    select * from (values
        ('X',         0,        5,      'A'),
        ('X',         5,       10,      'B'),
        ('X',        10,       15,      'B'),
        ('X',        15,       20,      'C'),
        ('X',        20,       25,      'C'),
        ('Y',         0,        6,      'A'),
        ('Y',         6,       12,      'A'),
        ('Y',        12,       20,      'B'),
        ('Y',        20,       22,      'B'),
        ('Y',        24,       27,      'B'),
        ('Y',        27,       30,      'B'),
        ('Y',        35,       40,      'C')
    ) as x( [Type], [Start], [End], [Category] )
), with_grp as (
    select *, n.Number - dense_rank() over (partition by [Type], [Category] order by n.Number ) as grp
    from cte
    join dbadmin.dbo.Numbers as n
        on n.Number >= cte.Start
        and n.Number <= cte.[End]
)
select [Type], [Category], min([Start]), max([End])
from with_grp
group by [Type], [Category], [grp]
order by [Type], [Category], min([Start])

Where dbadmin.dbo.Numbers is a tally table (a table with a single column and the rows being the numbers 1-(some large-ish number)). To uncover the magic, do a raw select from the with_grp cte. But that should do it.

Upvotes: 3

Related Questions