Black Dynamite
Black Dynamite

Reputation: 4147

T-SQL: Islands and Gaps, Start/End

I have a data set where I must consolidate the ranges for two particular fields. During research, I've seen that this problem is called "Islands and Gaps". Unfortunately, most of the IAGs concerning merging amongst a single column ("StartDate") or something. Whereas in my problem, I have two columns which constitute a range (PlzVon- PlzBis).

I've found plenty of examples, but they all tend to be all over the place with reference to skinning a cat. Some of them use CTEs, others use LAG/LEAD which I've never heard of till now.

I've tried repurposing one script that I've found mostly because I can understand what it's saying, but no dice. I can see all the "pieces" of the problem (finding upper/lower endpoints, mating the two records) but I can't figure out how to form a coherent statement with it. Example

In the above photo, I would like to combine the highlighted rows so that they are PlzVon-73000 and PlzBis-74999. I can tell that I need either a CTE or a correlated Sub-Query and have an ON condition of b.PlzVon = a.PlzBis + 1. But a single join wouldn't be enough as that new record could then mate with another, leading us down a nasty path of recursion and cursors.

Any help on figuring out how to merge these islands would be greatly appreciated.

Upvotes: 0

Views: 788

Answers (2)

Edmond Quinton
Edmond Quinton

Reputation: 1739

If I understand the question correctly, then this is a data island problem but on a range rather than a key. The following query demonstrates how this problem can be solved using the data island approach.

DECLARE @SourceData TABLE 
(
     ID         INT
    ,PlzVon     INT
    ,PlzBis     INT
)
INSERT INTO @SourceData 
VALUES
(1,38000,38999),
(2,73000,73999),
(3,74000,74999),
(4,75000,75999),
(5,85000,85999);


;WITH CTE_DataIslands  -- First CTE determine the start of each new data island
AS
(
    SELECT           [Main].ID
                    ,[Main].[PlzVon]
                    ,[Main].[PlzBis]
                    ,(
                        CASE
                            WHEN (LAG([Main].[PlzBis], 1) OVER  (ORDER BY [Main].[PlzVon] ASC) + 1 ) <> ([Main].[PlzVon]) THEN 1 -- If prev record's value for ([PlzBis] + 1)  is not equal to current record [PlzVon] value then it is the start of a new data island.
                            ELSE 0
                        END
                     ) AS [IsNewDataIsland]
    FROM            @SourceData [Main]
), CTE_GenerateGroupingID
AS
(
    SELECT  ID
            ,[PlzVon]
            ,[PlzBis]
            ,SUM([IsNewDataIsland]) OVER (ORDER BY [PlzVon] ROWS UNBOUNDED PRECEDING) AS GroupingID -- Create a running total of the IsNewDataIsland column this will create a grouping id we can now group on
    FROM    CTE_DataIslands
)
SELECT      MIN([PlzVon]) AS [PlzVon]       -- Min [PlzVon] will give the lower range
            ,MAX([PlzBis]) AS [PlzBis]      -- Max [PlzBis] will give the upper range (use min or max for any other column that should be included in the return result)
FROM        CTE_GenerateGroupingID
GROUP BY    GroupingID

Upvotes: 2

John Cappelletti
John Cappelletti

Reputation: 81990

I was not clear what the PK was so I added an ID in my sample table

Declare @YourTable Table (ID int,PlzVon int,PlzBis int)
Insert Into @YourTable values
(1,38000,38999),
(1,73000,73999),
(1,74000,74999)

;with cte0(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
     ,cteN(N) As (Select Top (Select max(PlzBis)-min(PlzVon)+1 From @YourTable) 
                         N=(Select min(PlzVon)-1 from @YourTable)+Row_Number() over (Order By (Select NULL)) 
                   From  cte0 N1, cte0 N2, cte0 N3, cte0 N4, cte0 N5, cte0 N6) 
     ,cteBase As (Select A.ID
                        ,A.PlzVon
                        ,A.PlzBis
                        ,PosNr = N.N
                        ,RowNr = N.N - Row_Number() Over (Partition By A.ID Order By N.N) 
                   From  cteN N
                   Join  @YourTable A on N.N Between A.PlzVon and A.PlzBis
                 )
Select ID
      ,PlzVon = min(PosNr)
      ,PlzBis = max(PosNr)
 From  cteBase
 Group By ID,RowNr
 Order By ID,min(PosNr)

Returns

ID  PlzVon  PlzBis
1   38000   38999
1   73000   74999

Upvotes: 0

Related Questions