Reputation: 5210
I tried to index a table today and noticed something awkward and couldn't understand the reason (even if it seems so simple?). I have a table like this:
CREATE TABLE [X].[A](
[TableName] [varchar](250) NOT NULL,
[Header] [varchar](4000) NOT NULL,
[Trailer] [varchar](4000) NOT NULL,
[FileName] [varchar](50) NOT NULL,
[StartDate] [date] NOT NULL,
[EndDate] [date] NOT NULL
) ON [PRIMARY]
Let's assume that I have a query like this (I simplified it a bit):
SELECT
Header
FROM
X.A ht (NOLOCK)
WHERE
ht.TableName = 'asd' AND
'2017-05-29' BETWEEN ht.StartDate AND ht.EndDate
I wanted to put an index to (TableName, StartDate, EndDate)
since they are in the where clause. My index is below:
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170530-113847] ON [X].[A]
(
[TableName] ASC,
[StartDate] ASC,
[EndDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
But I noticed that my query was not using the index and it does a Table Scan
, but when I change the SELECT
clause like the following:
SELECT
TableName
FROM
X.A ht (NOLOCK)
WHERE
ht.TableName = 'asd' AND
'2017-05-29' BETWEEN ht.StartDate AND ht.EndDate
It does use my index and does Index Seek
which is what I want. What's going on?
Upvotes: 0
Views: 624
Reputation: 69769
In a really simplified example, if your table had data like this:
ID | TableName | Header | Trailer | FileName | StartDate | EndDate
---+-----------+--------+---------+----------+------------+------------
1 | asd | H1 | T1 | F1 | 2017-05-28 | 2017-05-30
2 | asd | H2 | T2 | F2 | 2017-05-28 | 2017-05-30
3 | asd | H3 | T3 | F3 | 2017-05-28 | 2017-05-30
4 | asd | H4 | T4 | F4 | 2017-05-28 | 2017-05-30
5 | asd | H5 | T5 | F5 | 2017-05-28 | 2017-05-30
6 | asd | H6 | T6 | F6 | 2017-05-28 | 2017-05-30
7 | asd | H7 | T7 | F7 | 2017-05-28 | 2017-05-30
8 | xyz | H1 | T1 | F1 | 2017-05-28 | 2017-05-30
Then your index would look something like this
TableName | StartDate | EndDate | ID |
----------+------------+------------+----+-
asd | 2017-05-28 | 2017-05-30 | 1 |
asd | 2017-05-28 | 2017-05-30 | 2 |
asd | 2017-05-28 | 2017-05-30 | 3 |
asd | 2017-05-28 | 2017-05-30 | 4 |
asd | 2017-05-28 | 2017-05-30 | 5 |
asd | 2017-05-28 | 2017-05-30 | 6 |
asd | 2017-05-28 | 2017-05-30 | 7 |
xyz | 2017-05-28 | 2017-05-30 | 8 |
In this case you can get back all the columns required for this query:
SELECT
TableName
FROM
X.A ht (NOLOCK)
WHERE
ht.TableName = 'asd' AND
'2017-05-29' BETWEEN ht.StartDate AND ht.EndDate
But, if you want to select Header
then you can't just use the index (since Header
) is not stored in the index, for each row you have to find the clustering key (which in this simple example is ID
), then lookup the corresponding row in the main table to get the value of Header
. This (Index Seek + Key Lookup) is an expensive operation, and since you have to do this for 9 rows, and the original table only includes 10, it is probably more efficient to simple search the main table in the first place and not use the index (Clustered Index Scan).
Had you been looking up a very small number of records, like xyd
, then it is likely that an index seek along with key lookup would have been chosen. I don't think there is a simple cut off for how many records are needed to switch between the two methods, but in an answer to another question I found a cut off value of 3.7% of records, on a fairly small table. I expect as the table size increases this percentage decreases.
Upvotes: 1
Reputation: 12243
Your index covers the columns you are filtering on, but not the columns that you want to return. This means that the query still has to scan the table to get the RowID
of the rows that match your filtering in order to return your Header
value.
If you add an include
to your index that covers the Header
column you should get the seek you are after:
create nonclustered index [NonClusteredIndex-20170530-113847]
on [X].[A]([TableName] asc,[StartDate] asc,[EndDate] asc)
include([Header])
with (pad_index = off,statistics_norecompute = off,sort_in_tempdb = off,drop_existing = off,online = off,allow_row_locks = on,allow_page_locks = on)
Upvotes: 0