Reputation: 1457

SQL - Consolidate Overlapping Data

I have a simple data set in SQL Server that appears like this

**ROW    Start    End**
  0     1        2
  1     3        5
  2     4        6
  3     8        9

Graphically, the data would appear like this enter image description here

What I would like to achieve is to collapse the overlapping data so that my query returns

**ROW    Start    End**
  0     1        2
  1     3        6
  2     8        9

Is this possible in SQL Server without having to write a complex procedure or statement?

Upvotes: 4

Answers (3)

Joseph B

Reputation: 5669

Here's the SQL Fiddle for another alternative.

First, all the limits are sorted by order. Then the "duplicate" limits within an overlapping range are removed (because a Start is followed by another Start or an End is followed by another End). Now, that the ranges are collapsed, the Start and End values are written out again in the same row.

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select startx limit, 's' as pos from t
    union
    select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
    select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
    from temp_positions
)
, collapsed_positions as --Collapse ranges (select the first limit, if s is preceded or followed by e, and the last limit) and rank limits again
(
    select op1.*, RANK() OVER (ORDER BY op1.Rank) AS New_Rank
    from ordered_positions op1
    inner join ordered_positions op2
    on (op1.Rank = op2.Rank and op1.Rank = 1 and op1.pos = 's')
    or (op2.Rank = op1.Rank-1 and op2.pos = 'e' and op1.pos = 's') 
    or (op2.Rank = op1.Rank+1 and op2.pos = 's' and op1.pos = 'e')
    or (op2.Rank = op1.Rank and op1.pos = 'e' and op1.Rank = (select max(Rank) from ordered_positions))
)
, final_positions as --Now each s is followed by e. So, select s limits and corresponding e limits. Rank ranges
(
    select cp1.limit as cp1_limit, cp2.limit as cp2_limit,  RANK() OVER (ORDER BY cp1.limit) AS Final_Rank
    from collapsed_positions cp1
    inner join collapsed_positions cp2
    on cp1.pos = 's' and cp2.New_Rank = cp1.New_Rank+1
)
--Finally, subtract 1 from Rank to start Range #'s from 0
select fp.Final_Rank-1 seq_no, fp.cp1_limit as starty, fp.cp2_limit as endy
from final_positions fp;

You can test the result of each CTE and trace the progression. You can do this by removing the following CTE's and selecting from the preceding one, as below, for example.

with temp_positions as  --Select all limits as a single column along with the start / end flag (s / e)
(
    select startx limit, 's' as pos from t
    union
    select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
    select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
    from temp_positions
)
select *
from ordered_positions;

Upvotes: 2

JBrooks

Reputation: 10013

I would create a table valued function that returns the segments. Then you would call it like:

select *
from dbo.getCollapsedSegments(2, 9)

Here is an example (I replaced END with FIN since END is a reserved word.)

CREATE FUNCTION dbo.getCollapsedSegments(@Start int, @Fin int)
RETURNS @CollapsedSegments TABLE 
(
    -- Columns returned by the function
    start int,
    fin int
)
AS 
BEGIN

    SELECT @Start = (SELECT MIN(Start) FROM data WHERE @Start <= Start)

    WHILE (@Start IS NOT NULL AND @Start < @Fin)
    BEGIN
        INSERT INTO @CollapsedSegments
        SELECT MIN(s1.Start), MAX(ISNULL(s2.Fin, s1.Fin))
        FROM data s1
        LEFT JOIN data s2
        ON s1.Start < s2.Fin
        AND s2.Start <= s1.Fin
        AND @Fin > s2.start
        WHERE s1.Start <= @Start
        AND @Start < s1.Fin

        SELECT @Start = (SELECT MAX(Fin) FROM @CollapsedSegments)

        SELECT @Start = MIN(Start)
        FROM data
        WHERE Start > @Start
    END

    RETURN;
END

My test data:

create table data
(start int,
fin int)

insert into data
select 1, 2
union all
select 3, 5
union all
select 4, 6
union all
select 8, 9
union all
select 10, 11

Upvotes: 0

Gordon Linoff

Reputation: 1269603

The key to doing this is to assign a "grouping" value to overlapping segments. You can then aggregate by this column to get the information you want. A segment starts a group when it doesn't overlap with an earlier segment.

with starts as (
      select t.*,
             (case when exists (select 1 from table t2 where t2.start < t.start and t2.end >= .end)
                   then 0
                   else 1
              end) as isstart
      from table t
     ),
     groups as (
      select s.*,
             (select sum(isstart)
              from starts s2
              where s2.start <= s.start
             ) as grouping
      from starts s
     )
select row_number() over (order by min(start)) as row,
       min(start) as start, max(end) as end
from groups
group by grouping;

Upvotes: 1

SQL - Consolidate Overlapping Data

Answers (3)

Related Questions