Reputation: 673
I got a complex SQL query where the IN
statement could contain over 50,000 ids. The query timeouts after 20-30 seconds and the only way to resolve this issue is to change the query to return the results hopefully in a no more than 1-2 seconds. Increasing the timeout is NOT an option.
Create table:
CREATE TABLE Items
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[Name] NVARCHAR(200) NULL,
)
CREATE TABLE Feature
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[ItemId] [INT] NOT NULL,
[TagId] [INT] NOT NULL,
)
Feature
table ItemId
column is a non unique, non clustered index as well.
Select script:
SELECT i.Name
FROM Items i
INNER JOIN Feature f ON i.Id = f.ItemId
WHERE f.TagId IN (123, 234, 456, .....)
Even if the Items
table have 3-400,000 records and the IN
operator list has around 80,000 in the list, it will timeout.
I must improve the performance dramatically
Upvotes: 5
Views: 9181
Reputation: 89051
The key thing here is to get the IDS out of the query text SQL. You can pass the IDS in JSON or XML, or a Table-Valued Parameter, or bulk load a temp table. All will be much faster.
EG this
use AdventureWorks2017
set nocount on
dbcc freeproccache
go
declare @ids varchar(max) = ( select STRING_AGG(cast(salesorderid as varchar(max)),',') from sales.SalesOrderHeader )
select @ids
select count(*)
from
sales.SalesOrderheader
declare @sql nvarchar(max) = '
select *
from sales.salesorderheader
where SalesOrderID in (' + @ids + ')'
print '---------IN list---------'
set statistics time on
exec (@sql)
set statistics time off
print '---------IN list---------'
print '---------JSON array---------'
set statistics time on
select *
from sales.salesorderheader
where SalesOrderID in ( select value from openjson('[' + @ids + ']') )
set statistics time off
print '---------JSON array---------'
outputs
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
---------IN list---------
SQL Server parse and compile time:
CPU time = 11781 ms, elapsed time = 12115 ms.
SQL Server Execution Times:
CPU time = 657 ms, elapsed time = 1453 ms.
SQL Server Execution Times:
CPU time = 12438 ms, elapsed time = 13569 ms.
---------IN list---------
---------JSON array---------
SQL Server Execution Times:
CPU time = 656 ms, elapsed time = 984 ms.
---------JSON array---------
To use this method from C# is super-simple. Just serialize an array or list into a string, and pass it in a SqlParameter. EG:
var con = new SqlConnection("Server=localhost;database=adventureworks2017;integrated security=true");
con.Open();
var ids = Enumerable.Range(1, 50_000).ToList();
var cmd = con.CreateCommand();
cmd.CommandText = "select * from sales.SalesOrderHeader where salesorderid in (select value from openjson(@pIds))";
var pIds = cmd.Parameters.Add("@pIds", SqlDbType.NVarChar);
pIds.Value = JsonSerializer.Serialize(ids);
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
//do whatever
Console.Write(".");
}
}
Upvotes: 7
Reputation: 29647
Maybe using an EXISTS could be faster.
Since the query doesn't seem to need fields from the Feature table.
SELECT i.Name
FROM Items i
WHERE EXISTS (
SELECT 1
FROM Feature f
WHERE f.TagId IN (123,234,456,.....)
AND f.ItemId = i.id
);
If that doesn't work, some other ideas.
- Load the list of Tagid's into a new table (temporary?) with a primary key on Tagid. Then in the query join to that table instead of using IN.
- Add a non-unique index on Feature.TagId
Upvotes: 0
Reputation: 46
--develop your list for the IN operator
with cte1 as
(
select tagId
from feature
where ...
),
-- limit your feature table to tagid that you want to see
cte2 as
(
select tagid
from feature a
inner join cte1 b
on a.tagid = b.tagid
)
SELECT i.Name
FROM Items i
inner join cte2 b
on i.tagid = b.tagid
Upvotes: 0
Reputation: 51
You can insert the ids into a temporary table and then use it inside the join, also an index on the temp table may help
Upvotes: 2