Randy Minder
Randy Minder

Reputation: 48522

Why isn't a particular index being used in a query?

I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.

If I execute the following query, the PK is not used (unless I use an index hint)

Select Distinct(SubID) From Workflow Where ID >= @LastSeenWorkflowID

If I execute this query instead, the PK is used

Select Distinct(SubID) From Workflow Where ID >= 786400000

I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?

Upvotes: 6

Views: 172

Answers (3)

Francois
Francois

Reputation: 11

This is a classic example of local variable producing a sub-optimal plan.

You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.

See my blog post for more information: http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

Upvotes: 0

Remus Rusanu
Remus Rusanu

Reputation: 294447

Please post the execution plan(s), as well as the exact table definition, including all indexes.

When you use a variable the optimizer does no know what selectivity the query will have, the @LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for @LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.

You could provide a narrow index that is aimed specifically at this query:

CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);

or:

CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);

Such an index is too-good-to-pass for your query, no matter the value of @LastSeenWorkflowID.

Upvotes: 3

George Mastros
George Mastros

Reputation: 24506

Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:

Select Distinct(SubID) 
From Workflow 
Where ID >= @LastSeenWorkflowID
    And ID > 0

By adding the 2nd condition, it may cause the optimizer to use an index seek.

Upvotes: 2

Related Questions