stusmith
stusmith

Reputation: 14103

Why does SQL Server use an Index Scan instead of a Seek + RID lookup?

I have a table with approx. 135M rows:

CREATE TABLE [LargeTable]
(
    [ID] UNIQUEIDENTIFIER NOT NULL,
    [ChildID] UNIQUEIDENTIFIER NOT NULL,
    [ChildType] INT NOT NULL
)

It has a non-clustered index with no included columns:

CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX]
  ON [LargeTable] 
(
    [ChildID] ASC
)

(It is clustered on ID).

I wish to join this against a temporary table which contains a few thousand rows:

CREATE TABLE #temp
(
    ChildID         UNIQUEIDENTIFIER PRIMARY KEY,
    ChildType       INT
)

...add #temp data...

SELECT lt.ChildID, lt.ChildType
    FROM #temp t
    INNER  JOIN [LargeTable] lt
        ON lt.[ChildID] = t.[ChildID]

However the query plan includes an index scan on the large table:

Index Scan

If I change the index to include extra columns:

CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX] ON [LargeTable] 
(
    [ChildID] ASC
)
INCLUDE [ChildType]

Then the query plan changes to something more sensible:

Index Seek

So my question is: Why can't SQL Server still use an index seek in the first scenario, but with a RID lookup to get from the non-clustered index to the table data? Surely that would be more efficient than an index scan on such a large table?

Upvotes: 2

Views: 2239

Answers (3)

Remus Rusanu
Remus Rusanu

Reputation: 294317

Colloquially is called 'the index tipping point'. Basically, at what point does the cost based optimizer consider that is more effective to do a scan rather than seek + lookup. Usually is around 20% of the size, which in your case will base on an estimate coming from the #temp table stats. YMMV.

You already have your answer: include the required column, make the index covering.

Upvotes: 0

Kris Vandermotten
Kris Vandermotten

Reputation: 10201

The first query plan actually makes a lot of sense. Remember that SQL Server never reads records, it reads pages. In your table, a page contains many records, since those records are so small.

With the original index, if the second query plan would be used, after finding all the RID's in the index, and reading index pages to do so, pages in the clustered index need to be read to read the ChildType column. In a worst case scenario, that is an entire page for each record it needs to read. As there are many records per page, that might boil down to reading a large percentage of the pages in the clustered index.

SQL server guessed, based on statistics, that simply scanning the pages in the clustered index would require less page reads in total, because it then avoids reading the pages in the non-clustered index.

What matters here is the number of rows in the temp table compared to the number of pages in the large table. Assuming a random distribution of ChildID in the large table, as soon as the number of rows in the temp table approaches or supersedes the number of pages in the large table, SQL server will have to read virtually every page in the large table anyway.

Upvotes: 1

NickyvV
NickyvV

Reputation: 1744

Because the column ChildType isn't covered in an index, it has to go back to the clustered index (with the mentioned Row IDentifier lookup) to get the values for ChildType.
When you INCLUDE this column in the nonclustered index it will be added to the leaf-level of the index where it is available for querying.

Upvotes: 0

Related Questions