Reputation: 1492
I am using SQL Server to store tens of millions of records. I need to be able to query its tables to find missing rows where there are gaps in the Id column, as there should be none.
I am currently using a solution that I have found here on StackOverflow:
CREATE PROCEDURE [dbo].[find_missing_ids]
@Table NVARCHAR(128)
AS
BEGIN
DECLARE @query NVARCHAR(MAX)
SET @query = 'WITH Missing (missnum, maxid) '
+ N'AS '
+ N'('
+ N' SELECT 1 AS missnum, (select max(Id) from ' + @Table + ') '
+ N' UNION ALL '
+ N' SELECT missnum + 1, maxid FROM Missing '
+ N' WHERE missnum < maxid '
+ N') '
+ N'SELECT missnum '
+ N'FROM Missing '
+ N'LEFT OUTER JOIN ' + @Table + ' tt on tt.Id = Missing.missnum '
+ N'WHERE tt.Id is NULL '
+ N'OPTION (MAXRECURSION 0);';
EXEC sp_executesql @query
END;
This solution has been working very well, but it has been getting slower and more resource intensive as the tables have grown. Now, running the procedure on a table of 38 million rows is taking about 3.5 minutes and lots of CPU.
Is there a more efficient way to perform this? After a certain range has been found to not contain any missing Ids, I no longer need to check that range again.
Upvotes: 1
Views: 1946
Reputation: 3932
JBJ's answer is almost complete. The query needs to return the From and Through for each range of missing values.
select B+1 as [From],A-1 as[Through]from
(select StuffID as A,
lag(StuffID)over(order by StuffID)as B from Stuff)z
where A<>B+1
order by A
I created a test table with 50 million records, then deleted a few. The first row of the result is:
From Through
33 35
This indicates that all IDs in the range from 33 through 35 are missing, i.e. 33, 34 and 35.
On my machine the query took 37 seconds.
Upvotes: 6
Reputation: 17943
Try this solution, it will be faster than CTE
.
;WITH CTE AS
(
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) RN
FROM ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v(id) --10 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v1(id)--100 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v2(id) --1000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v3(id) --10000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v4(id)--100000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v5(id)--1000000 ROWS
)
SELECT RN AS Missing
FROM CTE C
LEFT JOIN YOURABLE T ON T.ID=R.ID
WHERE T.ID IS NULL
If you want you can use master..[spt_values]
also to generate the number like following.
SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) RN
FROM master..[spt_values] T1
CROSS JOIN (select top 500 * from master..[spt_values]) T2
Above query will generate 1268500
numbers
Note: You need to add the CROSS JOIN
as per your requirement.
Upvotes: 1
Reputation: 403
try
select pId
from (select Id, lag(Id) over (order by Id) pId from yourschema.yourtable) e
where pId <> (Id-1)
order by Id
replacing yourschema.yourtable with the appropriate table information
Upvotes: 1