Nicholas Post
Nicholas Post

Reputation: 1857

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.

Here is the format of the table:

id (pk-identity)  |  GameID (int)  |  etc.  |  etc.  

I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.

Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?

Upvotes: 9

Views: 8556

Answers (6)

Vikram Mahapatra
Vikram Mahapatra

Reputation: 101

Problem: we need to find the gap range in id field

SELECT * FROM #tab1

id          col1
----------- --------------------
1           a  
2           a  
3           a  
8           a    
9           a  
10          a  
11          a  
15          a  
16          a  
17          a  
18          a

Solution

WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId  FROM #tab1 t)

SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId

Output

GapStart    GapEnd
----------- -----------
4           7
12          14

Upvotes: 0

Vikram Mahapatra
Vikram Mahapatra

Reputation: 101

     SELECT * FROM #tab1
            id          col1
            ----------- --------------------
            1           a
            2           a
            3           a
            8           a
            9           a
            10          a
            11          a
            15          a
            16          a
            17          a
            18          a

 WITH cte (id,nextId) as
                (SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId  FROM #tab1 t)

 SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
                WHERE id + 1 <> nextId

    GapStart    GapEnd
    ----------- -----------
    3           8
    11          15

Upvotes: 1

Ben Thul
Ben Thul

Reputation: 32697

I like the "gaps and islands" approach. It goes a little something like this:

WITH Islands AS (
    SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
    FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID

That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:

WITH 
cte1 AS (
    SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
    FROM dbo.yourTable
)
, cte2 AS (
    SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
    FROM cte1
    GROUP BY [rn]
)
,Islands AS (
    SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
  from cte2
)

SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
    ON a.IslandID + 1 = b.IslandID

Upvotes: 2

Kaf
Kaf

Reputation: 33809

Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):

;WITH Digits AS (
    select Digit 
    from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
    select u.Digit 
          + t.Digit*10 
          + h.Digit*100 
          + th.Digit*1000
          + tth.Digit*10000 
          --Add 10000, 100000 multipliers if required here.
          as myId
    from Digits u
    cross join Digits t
    cross join Digits h
    cross join Digits th
    cross join Digits tth
    --Add the cross join for higher numbers 
    )
SELECT myId 
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)

Upvotes: 0

gvee
gvee

Reputation: 17161

Numbers table!

CREATE TABLE dbo.numbers (
   number int NOT NULL
)

ALTER TABLE dbo.numbers
ADD
   CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
     WITH FILLFACTOR = 100
GO

INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM     (
        SELECT number
        FROM   master..spt_values
        WHERE  type = 'P'
        AND    number <= 255
       ) As a
 CROSS
  JOIN (
        SELECT number
        FROM   master..spt_values
        WHERE  type = 'P'
        AND    number <= 255
       ) As b
GO

Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...

SELECT *
FROM   dbo.numbers
WHERE  NOT EXISTS (
         SELECT *
         FROM   your_table
         WHERE  id = numbers.number
       )

-- OR

SELECT *
FROM   dbo.numbers
 LEFT
  JOIN your_table
    ON your_table.id = numbers.number
WHERE  your_table.id IS NULL

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269693

The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:

select t.*, lead(id) over (order by id) as nextid
from t;

If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
      from t
     ) t
where nextid <> id+1;

EDIT:

Without the lead(), I would do the same thing with a correlated subquery:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
             (select top 1 id
              from t t2
              where t2.id > t.id
              order by t2.id
             ) as nextid
      from t
     ) t
where nextid <> id+1;

Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.

Upvotes: 15

Related Questions