Reputation: 1457
I have a simple data set in SQL Server that appears like this
**ROW Start End**
0 1 2
1 3 5
2 4 6
3 8 9
Graphically, the data would appear like this
What I would like to achieve is to collapse the overlapping data so that my query returns
**ROW Start End**
0 1 2
1 3 6
2 8 9
Is this possible in SQL Server without having to write a complex procedure or statement?
Upvotes: 4
Views: 1084
Reputation: 5669
Here's the SQL Fiddle for another alternative.
First, all the limits are sorted by order. Then the "duplicate" limits within an overlapping range are removed (because a Start is followed by another Start or an End is followed by another End). Now, that the ranges are collapsed, the Start and End values are written out again in the same row.
with temp_positions as --Select all limits as a single column along with the start / end flag (s / e)
(
select startx limit, 's' as pos from t
union
select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
from temp_positions
)
, collapsed_positions as --Collapse ranges (select the first limit, if s is preceded or followed by e, and the last limit) and rank limits again
(
select op1.*, RANK() OVER (ORDER BY op1.Rank) AS New_Rank
from ordered_positions op1
inner join ordered_positions op2
on (op1.Rank = op2.Rank and op1.Rank = 1 and op1.pos = 's')
or (op2.Rank = op1.Rank-1 and op2.pos = 'e' and op1.pos = 's')
or (op2.Rank = op1.Rank+1 and op2.pos = 's' and op1.pos = 'e')
or (op2.Rank = op1.Rank and op1.pos = 'e' and op1.Rank = (select max(Rank) from ordered_positions))
)
, final_positions as --Now each s is followed by e. So, select s limits and corresponding e limits. Rank ranges
(
select cp1.limit as cp1_limit, cp2.limit as cp2_limit, RANK() OVER (ORDER BY cp1.limit) AS Final_Rank
from collapsed_positions cp1
inner join collapsed_positions cp2
on cp1.pos = 's' and cp2.New_Rank = cp1.New_Rank+1
)
--Finally, subtract 1 from Rank to start Range #'s from 0
select fp.Final_Rank-1 seq_no, fp.cp1_limit as starty, fp.cp2_limit as endy
from final_positions fp;
You can test the result of each CTE and trace the progression. You can do this by removing the following CTE's and selecting from the preceding one, as below, for example.
with temp_positions as --Select all limits as a single column along with the start / end flag (s / e)
(
select startx limit, 's' as pos from t
union
select endx, 'e' as pos from t
)
, ordered_positions as --Rank all limits
(
select limit, pos, RANK() OVER (ORDER BY limit) AS Rank
from temp_positions
)
select *
from ordered_positions;
Upvotes: 2
Reputation: 10013
I would create a table valued function that returns the segments. Then you would call it like:
select *
from dbo.getCollapsedSegments(2, 9)
Here is an example (I replaced END with FIN since END is a reserved word.)
CREATE FUNCTION dbo.getCollapsedSegments(@Start int, @Fin int)
RETURNS @CollapsedSegments TABLE
(
-- Columns returned by the function
start int,
fin int
)
AS
BEGIN
SELECT @Start = (SELECT MIN(Start) FROM data WHERE @Start <= Start)
WHILE (@Start IS NOT NULL AND @Start < @Fin)
BEGIN
INSERT INTO @CollapsedSegments
SELECT MIN(s1.Start), MAX(ISNULL(s2.Fin, s1.Fin))
FROM data s1
LEFT JOIN data s2
ON s1.Start < s2.Fin
AND s2.Start <= s1.Fin
AND @Fin > s2.start
WHERE s1.Start <= @Start
AND @Start < s1.Fin
SELECT @Start = (SELECT MAX(Fin) FROM @CollapsedSegments)
SELECT @Start = MIN(Start)
FROM data
WHERE Start > @Start
END
RETURN;
END
My test data:
create table data
(start int,
fin int)
insert into data
select 1, 2
union all
select 3, 5
union all
select 4, 6
union all
select 8, 9
union all
select 10, 11
Upvotes: 0
Reputation: 1269603
The key to doing this is to assign a "grouping" value to overlapping segments. You can then aggregate by this column to get the information you want. A segment starts a group when it doesn't overlap with an earlier segment.
with starts as (
select t.*,
(case when exists (select 1 from table t2 where t2.start < t.start and t2.end >= .end)
then 0
else 1
end) as isstart
from table t
),
groups as (
select s.*,
(select sum(isstart)
from starts s2
where s2.start <= s.start
) as grouping
from starts s
)
select row_number() over (order by min(start)) as row,
min(start) as start, max(end) as end
from groups
group by grouping;
Upvotes: 1