yakya
yakya

Reputation: 5210

SELECT clause column causes index not being used

I tried to index a table today and noticed something awkward and couldn't understand the reason (even if it seems so simple?). I have a table like this:

CREATE TABLE [X].[A](
    [TableName] [varchar](250) NOT NULL,
    [Header] [varchar](4000) NOT NULL,
    [Trailer] [varchar](4000) NOT NULL,
    [FileName] [varchar](50) NOT NULL,
    [StartDate] [date] NOT NULL,
    [EndDate] [date] NOT NULL
) ON [PRIMARY]

Let's assume that I have a query like this (I simplified it a bit):

SELECT 
    Header
FROM
    X.A ht (NOLOCK)
WHERE
    ht.TableName = 'asd' AND
    '2017-05-29' BETWEEN ht.StartDate AND ht.EndDate

I wanted to put an index to (TableName, StartDate, EndDate) since they are in the where clause. My index is below:

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170530-113847] ON [X].[A]
(
    [TableName] ASC,
    [StartDate] ASC,
    [EndDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

But I noticed that my query was not using the index and it does a Table Scan, but when I change the SELECT clause like the following:

SELECT 
    TableName
FROM
    X.A ht (NOLOCK)
WHERE
    ht.TableName = 'asd' AND
    '2017-05-29' BETWEEN ht.StartDate AND ht.EndDate

It does use my index and does Index Seek which is what I want. What's going on?

Upvotes: 0

Views: 624

Answers (2)

GarethD
GarethD

Reputation: 69769

In a really simplified example, if your table had data like this:

ID | TableName | Header | Trailer | FileName | StartDate  | EndDate
---+-----------+--------+---------+----------+------------+------------
 1 |    asd    |   H1   |    T1   |    F1    | 2017-05-28 | 2017-05-30
 2 |    asd    |   H2   |    T2   |    F2    | 2017-05-28 | 2017-05-30
 3 |    asd    |   H3   |    T3   |    F3    | 2017-05-28 | 2017-05-30
 4 |    asd    |   H4   |    T4   |    F4    | 2017-05-28 | 2017-05-30
 5 |    asd    |   H5   |    T5   |    F5    | 2017-05-28 | 2017-05-30
 6 |    asd    |   H6   |    T6   |    F6    | 2017-05-28 | 2017-05-30
 7 |    asd    |   H7   |    T7   |    F7    | 2017-05-28 | 2017-05-30
 8 |    xyz    |   H1   |    T1   |    F1    | 2017-05-28 | 2017-05-30

Then your index would look something like this

TableName | StartDate  | EndDate    | ID | 
----------+------------+------------+----+-
   asd    | 2017-05-28 | 2017-05-30 |  1 | 
   asd    | 2017-05-28 | 2017-05-30 |  2 | 
   asd    | 2017-05-28 | 2017-05-30 |  3 | 
   asd    | 2017-05-28 | 2017-05-30 |  4 | 
   asd    | 2017-05-28 | 2017-05-30 |  5 | 
   asd    | 2017-05-28 | 2017-05-30 |  6 | 
   asd    | 2017-05-28 | 2017-05-30 |  7 | 
   xyz    | 2017-05-28 | 2017-05-30 |  8 | 

In this case you can get back all the columns required for this query:

SELECT 
    TableName
FROM
    X.A ht (NOLOCK)
WHERE
    ht.TableName = 'asd' AND
    '2017-05-29' BETWEEN ht.StartDate AND ht.EndDate

But, if you want to select Header then you can't just use the index (since Header) is not stored in the index, for each row you have to find the clustering key (which in this simple example is ID), then lookup the corresponding row in the main table to get the value of Header. This (Index Seek + Key Lookup) is an expensive operation, and since you have to do this for 9 rows, and the original table only includes 10, it is probably more efficient to simple search the main table in the first place and not use the index (Clustered Index Scan).

Had you been looking up a very small number of records, like xyd, then it is likely that an index seek along with key lookup would have been chosen. I don't think there is a simple cut off for how many records are needed to switch between the two methods, but in an answer to another question I found a cut off value of 3.7% of records, on a fairly small table. I expect as the table size increases this percentage decreases.

Upvotes: 1

iamdave
iamdave

Reputation: 12243

Your index covers the columns you are filtering on, but not the columns that you want to return. This means that the query still has to scan the table to get the RowID of the rows that match your filtering in order to return your Header value.

If you add an include to your index that covers the Header column you should get the seek you are after:

create nonclustered index [NonClusteredIndex-20170530-113847]
on [X].[A]([TableName] asc,[StartDate] asc,[EndDate] asc)
include([Header])
with (pad_index = off,statistics_norecompute = off,sort_in_tempdb = off,drop_existing = off,online = off,allow_row_locks = on,allow_page_locks = on)

Upvotes: 0

Related Questions