Slider345
Slider345

Reputation: 4758

inconsistent SQL Server Execution Plan Key Lookups

I have a big table in SQL Server 2005 that I have to select records from via a non-clustered key, and I'm trying to optimize this process as best I can.

The table has quite a few columns, and I have added non-clustered indexes on three different columns.

SELECT * FROM table WHERE Field1 = 10;
SELECT * FROM table WHERE Field2 = 40;
SELECT * FROM table WHERE Field3 = 'A';

Field1 and Field2 are integer fields, and Field3 is a varchar.

When I request an estimated query execution plan from SQL Server for these three queries, I get basically the same plan for each:

SELECT -> Nested Loop -> Index Seek
                      -> Key Lookup

But I find that that execution times are vastly inconsistent. Specifically, the second query takes up 98% of the total query cost. Its execution plan is the same as the others, except that the Key Lookup step has a cost of 100% compared to the Index Seek. In the other two queries, it's closer to 50%.

I understand that a Key Lookup is undesirable, and can be avoided by adding columns to the indexes so that the extra columns don't need to be looked up. In this case however, I want all the columns in the table to be returned to me, so it doesn't make sense to add them all to the index. But how could one index cause a Key Lookup operation to take so much longer than another Key Lookup?

Upvotes: 3

Views: 1760

Answers (3)

RichardTheKiwi
RichardTheKiwi

Reputation: 107716

But how could one index cause a Key Lookup operation to take so much longer than another Key Lookup?

It all depends on the current statistics of the expected Key Lengths.

The Query Optimizer (QO) works by looking at statistics on the indexes. The index on fieldb could have an average cardinality of 100, whereas the index on the other fields an average cardinality of 10000 (100 times more specific). So it gives you a relative indication, based on an averaged plan.

To see specifics, always turn on statistics*, but that really only gives you the actual execution time of specific values. In some ways the QO can be more accurate over the long term if the access variables are random.

Consider a case of these two queries

SELECT * FROM table WHERE Field2 = 40;
SELECT * FROM table WHERE Field2 = 42;

Let's say hypothetically 42 is a special code that is used in 80% of all records. 40 is a unique code used in only 1 record. You cannot expect the QO to show different estimated rows for each? Yet if you ran the queries, unless parameterization/plan-caching is involved it is likely the 2nd one will use the clustered index to scan the table instead of performing an 80% (expensive) bookmark lookup.

* to turn on statistics reporting

set statistics io on
set statistics time on

Upvotes: 1

Bradford Hoagland
Bradford Hoagland

Reputation: 291

Cost is not necessarily related to time, but instead related more closely to resource consumption. It's hard to say really what is going on in your case without seeing the query, knowing the table structures, and the makeup of the data contained in the columns. But the number of records matched by your criteria can make a difference in cost in the different query plans.

Regarding the indexing of your table, if you can include as many of the criteria columns in your index as possible, then you may be better off. Depending on your data, you may get a better boost by indexing on the most selective column (the column that is most likely going to eliminate the most data), first. (That is not necessarily the case, however, especially if you join with other tables, sort the results, etc.) You can add other columns as necessary or beneficial. You can use the "include" clause when creating your index to add less-selective "where" columns to the index to reduce the number of key look-ups required.

Upvotes: 2

Joe Stefanelli
Joe Stefanelli

Reputation: 135808

I've seen several cases where the "query cost (relative to the batch)" percentage can be misleading to say the least.

Better to look at the actual execution plans in conjunction with io and time statistics to understand what's really happening.

set statistics io on
set statistics time on

SELECT * FROM table WHERE Field1 = 10;
SELECT * FROM table WHERE Field2 = 40;
SELECT * FROM table WHERE Field3 = 'A';

Then look at the logical reads, CPU times and elapsed times returned for each query.

Upvotes: 0

Related Questions