Reputation: 673

IN operator with a very large list is too slow in SQL Server. What should I use instead?

I got a complex SQL query where the IN statement could contain over 50,000 ids. The query timeouts after 20-30 seconds and the only way to resolve this issue is to change the query to return the results hopefully in a no more than 1-2 seconds. Increasing the timeout is NOT an option.

Create table:

CREATE TABLE Items
(
    [Id] [INT] IDENTITY(1,1) NOT NULL,
    [Name] NVARCHAR(200) NULL,
) 

CREATE TABLE Feature
(
    [Id] [INT] IDENTITY(1,1) NOT NULL,
    [ItemId] [INT] NOT NULL,
    [TagId] [INT] NOT NULL,
)

Feature table ItemId column is a non unique, non clustered index as well.

Select script:

SELECT i.Name
FROM Items i
INNER JOIN Feature f ON i.Id = f.ItemId
WHERE f.TagId IN (123, 234, 456, .....)

Even if the Items table have 3-400,000 records and the IN operator list has around 80,000 in the list, it will timeout.

I must improve the performance dramatically

Upvotes: 5

Answers (4)

David Browne - Microsoft

Reputation: 89051

The key thing here is to get the IDS out of the query text SQL. You can pass the IDS in JSON or XML, or a Table-Valued Parameter, or bulk load a temp table. All will be much faster.

EG this

use AdventureWorks2017
set nocount on
dbcc freeproccache
go

declare @ids varchar(max) = ( select STRING_AGG(cast(salesorderid as varchar(max)),',') from sales.SalesOrderHeader )

select @ids 

select count(*)
from 
sales.SalesOrderheader

declare @sql nvarchar(max) = '
select * 
from sales.salesorderheader
where SalesOrderID in (' + @ids + ')'

print '---------IN list---------'
set statistics time on
exec (@sql)
set statistics time off
print '---------IN list---------'



print '---------JSON array---------'
set statistics time on
select * 
from sales.salesorderheader
where SalesOrderID in ( select value from openjson('[' + @ids + ']') )
set statistics time off
print '---------JSON array---------'

outputs

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
---------IN list---------
SQL Server parse and compile time: 
   CPU time = 11781 ms, elapsed time = 12115 ms.

 SQL Server Execution Times:
   CPU time = 657 ms,  elapsed time = 1453 ms.

 SQL Server Execution Times:
   CPU time = 12438 ms,  elapsed time = 13569 ms.
---------IN list---------
---------JSON array---------

 SQL Server Execution Times:
   CPU time = 656 ms,  elapsed time = 984 ms.
---------JSON array---------

To use this method from C# is super-simple. Just serialize an array or list into a string, and pass it in a SqlParameter. EG:

var con = new SqlConnection("Server=localhost;database=adventureworks2017;integrated security=true");
con.Open();

var ids = Enumerable.Range(1, 50_000).ToList();

var cmd = con.CreateCommand();
cmd.CommandText = "select * from sales.SalesOrderHeader where salesorderid in (select value from openjson(@pIds))";
var pIds = cmd.Parameters.Add("@pIds", SqlDbType.NVarChar);

pIds.Value = JsonSerializer.Serialize(ids);

using (var rdr = cmd.ExecuteReader())
{
        while (rdr.Read())
        {
            //do whatever
            Console.Write(".");
        }
}

Upvotes: 7

LukStorms

Reputation: 29647

Maybe using an EXISTS could be faster.
Since the query doesn't seem to need fields from the Feature table.

SELECT i.Name
FROM Items i
WHERE EXISTS (
    SELECT 1
    FROM Feature f
    WHERE f.TagId IN (123,234,456,.....)
      AND f.ItemId = i.id
);

If that doesn't work, some other ideas.
- Load the list of Tagid's into a new table (temporary?) with a primary key on Tagid. Then in the query join to that table instead of using IN.
- Add a non-unique index on Feature.TagId

Upvotes: 0

Rbell

Reputation: 46

--develop your list for the IN operator    
with cte1 as
(
select tagId
from feature
where ...
),
-- limit your feature table to tagid that you want to see
cte2 as
(
select tagid
from feature a
inner join cte1 b
on a.tagid = b.tagid
)
SELECT i.Name
FROM Items i
inner join cte2 b
on i.tagid = b.tagid

Upvotes: 0

Altair Souza

Reputation: 51

You can insert the ids into a temporary table and then use it inside the join, also an index on the temp table may help

Upvotes: 2

IN operator with a very large list is too slow in SQL Server. What should I use instead?

Answers (4)

Related Questions