Pete
Pete

Reputation: 45

SQL query with lots of IN parameters is slow

I am executing a number of queries with many values specified in an "IN" clause, like this:

SELECT 
    [time_taken], [distance], [from_location_geocode_id],
    [to_location_geocode_id] 
FROM 
    [Travel_Matrix] 
WHERE 
    [from_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5) 
    AND [to_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5)

The example shows 5 parameters, but in practice there can be hundreds of these.

For a small numbers of parameters (up to about 400), SQL Server uses an execution plan with a number of "compute scalar" operations, which are then concatenated, sorted and joined in order to return the results.

For a large number of parameters (over 400), it uses a "hash match (right semi join)" method, which is quicker.

However, I would like it to use the second execution plan much earlier e.g. on queries with 50 parameters, since my tests have shown queries with 50-400 parameters tend to get very slow.

I've tried using various "OPTION" values on my query, but cannot get it to execute using the second execution plan, which I know would be more efficient.

I'd be grateful to anyone who can advise how to give the query the correct hints, so that it executes in the manner of the second execution plan.

Thanks

Upvotes: 4

Views: 5995

Answers (3)

TheGameiswar
TheGameiswar

Reputation: 28890

You also can create filtered index with those parameters,Even if you have index specifically covering all column values.With filtered index ,your queries will be much faster.But your inserts will be little slower and filtered indexes specifically fit your purpose..

Ex:
create table test
(
id int
)

insert into test
select top 100* from numbers
where n<=1100

now if our queries are always with large parameters say id in (2,100,45,98...)

if we create a filtered index like below

create index on dbo.test(id)
where id in (2,958,100)

our query will use that index and will be much faster,of course there are few limitations like between queries,case queries ,slower inserts.But i recommend testing this option and also make it covered

Update:
Further statistics are key to estimating row values,if you dont have an index with fromlcoationid and tolcoationid as key columns,sql will not create multicolumn stats.So one more option is to create multicolumn stats ,if you dont want to go with filtered index approach...

create statistics test1 on dbo.test(fromlocationid,tolcoationid)
where fromlocationid in (@param1,.....) and tolocationid in (@param1,@param2...)

Only issue i see with filtered stats is ,they will not be updated so frequently compared to regular stats . so you may want to try updating them manually through a job depending on your needs

Upvotes: 0

Jaydip Jadhav
Jaydip Jadhav

Reputation: 12309

In Performance perspective IN clause is not good, try some thing below like this

DECLARE @Tmp TABLE(Id INT)
INSERT INTO @Tmp(Id) VALUES(@param1), (@param2), (@param3), (@param4), (@param5)

SELECT 
   [time_taken], [distance], [from_location_geocode_id],
   [to_location_geocode_id] 
FROM 
[Travel_Matrix] 
WHERE 
EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[from_location_geocode_id])
AND EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[to_location_geocode_id])

Upvotes: 4

koolkoda
koolkoda

Reputation: 365

I think 400 parameters using the IN clause is too much. You are better off storing these values in a temporary table and doing a JOIN on it, maybe with an index on the temp table's column to speed things up.

Upvotes: 5

Related Questions