Paul
Paul

Reputation: 69

Find id's where records do not exist for IN expression

Can anybody help me work out how to approach a common problem that I come across every now and then but tend to work around the long way? I'm trying to find an elegant way to identifying what Id's don't exist in a table given a large list of values.

I've been provided with an Excel sheet with a list of 4000+ string id's and I need to identify which of them don't exist in SQL Server table.

My first action was to use Notepad++ and convert the list into a CSV list similar to 'XX1','XX2','XX3' etc and the running a SELECT * FROM [TABLE] WHERE [ID] IN ('XX1','XX2','XX3',...). This shows up 2 fewer records than shown in Excel so now I need to sift through them to identify which two aren't in the database.

Thanks for any help.

Paul.

Upvotes: 2

Views: 4177

Answers (3)

Biju jose
Biju jose

Reputation: 273

Believing you can create tables in database

If you are getting files in excel sheet or in email move the content to an excel sheet if its an email or directly import the excel sheet using Import data wizard to a staging table after that you can compare the data in staging table and your original table easily which much easier than loading the id in table variables or #temp tables. This video shows how to import excel data to sql server https://www.youtube.com/watch?v=Z1vqhYlwcyA

Also do a random check that all the values are populated or is any NULL values are inserted also.

One more thing to add using IN clause can which having thousands of values can throw an error. Please check this page in SO "IN" clause limitation in Sql Server

Upvotes: 1

LukStorms
LukStorms

Reputation: 29667

Just a few demo's to add to the possibilies

Via a table variable.
This time with a NOT IN clause:

DECLARE @varTbl TABLE(ID varchar(8));

insert into @varTbl values ('XX1'),('XX2');

select id from @varTbl 
where id not in (select distinct id from [SomeTable]);

Via a temporary table that only exists during your session.
This time with a LEFT JOIN.

CREATE TABLE #tempTbl (ID varchar(8));

insert into #tempTbl values ('XX1'),('XX2');

select tmp.id 
from #tempTbl tmp
left join [SomeTable] t on (tmp.id = t.id)
where t.id is null
group by tmp.id
order by tmp.id;

Another completely diffent method is to compare via files. You first export the ID's of your reference data and the ID's from the destination table to sorted text files. Then you filter out the differences of lines that exist in the reference data but not in the destination table.
On Windows it's something you could do via PowerShell.

$ref = Get-Content WhatShouldBe.txt
$dest = Get-Content WhatIsNow.txt

Compare-Object $ref $dest |Where-Object {$_.SideIndicator -eq "<="} |Select -ExpandProperty InputObject > WhatCanBeAdded.txt

Upvotes: 1

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391446

You can join against an inline table generated with the table value constructor, provided it doesn't overflow with that many id's, in which case you need to dump them into a table.

The limit for number of rows in one VALUES expression is 1000, according to the documentation:

The maximum number of rows that can be constructed by inserting rows directly in the VALUES list is 1000

To join with an inline table, use the syntax:

INNER JOIN (VALUES (1), (2), ...) AS inline (id)

Here's an example, with this table (your actual table):

CREATE TABLE databasetable (ID INT NOT NULL)
INSERT INTO databasetable VALUES (1), (2), (3), (5), (6), (8), (9), (10)

you can query for the missing values like this:

SELECT
    excel.id
FROM
    databasetable AS A
    RIGHT JOIN (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) AS excel (id)
        ON A.id = excel.id
WHERE
    A.id IS NULL

You would thus have to generate the VALUES (1), (2), (3), ...) from your Excel spreadsheet.

To use a temporary table, if you have more than 1000 rows:

DECLARE @EXCEL TABLE (ID INT)
INSERT INTO @EXCEL VALUES (1), (2), (3), ..... -- first 1000
INSERT INTO @EXCEL VALUES (1), (2), (3), ..... -- next 1000
INSERT INTO ...                                -- and so on

SELECT
    @excel.id
FROM
    databasetable AS A
    RIGHT JOIN @excel
        ON A.id = @excel.id
WHERE
    A.id IS NULL

Upvotes: 4

Related Questions