Reputation: 1010
I have a table
Archive(VarId SMALLINT, Timestamp DATETIME, Value FLOAT)
VarId
is not unique. The table contains measurements. I have a clustered index on Timestamp
. Now i have the requirement of finding a measurement for a specific VarId before a specific date. So I do:
SELECT TOP(1) *
FROM Archive
WHERE VarId = 135
AND Timestamp < '2012-06-01 14:21:00'
ORDER BY Timestamp DESC;
If there is no such measurement this query searches the whole table. So I introduced another index on (VarId, Timestamp)
.
My problem is: SQL Server doesn't seem to care about it, the query still takes forever. When I explicitly state 'WITH (INDEX = <id>)'
it works as it should. What can I do so SQL Server uses my index automatically?
I'm using SQL Server 2005.
Upvotes: 4
Views: 298
Reputation: 48826
My guess is that your index design is the issue. You have a CLUSTERED index on a DATETIME field and I suspect that it is not unique data, much like VarId, and hence you did not declare it as UNIQUE. Because it is not unique there is a hidden, 4-byte "uniqueifier" field (so that each row can by physically unique regardless of you not giving it unique data) and the rows with the same DATETIME value are essentially random within the group of same DATETIME values (so even narrowing down a time still requires scanning through that grouping). You also have a NONCLUSTERED index on VarId, Timestamp. NONCLUSTERED indexes include the data from the CLUSTERED index so internally your NONCLUSTERED index is really: VarId, Timestamp, Timestamp (from the CLUSTERED index). So you could have left off the Timestamp column in the NONCLUSTERED index and it would have all been the same to the optimizer, but in a sense it would have been better as it would be a smaller index.
So your physical layout is based on a date while the VarId values are spread across those dates. Hence VarId = 135 can be spread very far apart in terms of data pages. Yes, your non-clustered index does group them together, but the optimizer is probably looking at the fact that you are wanting all fields (the "SELECT *" part) and the Timestamp < '2012-06-01 14:21:00' condition in addition to that seems to get most of what you need as opposed to finding a few rows and doing a bookmark lookup to get the "Value" field to fulfill the "SELECT *". Quite possibly if you do just "SELECT TOP(1) VarId, Timestamp" it would more likely use your NONCLUSTERED index without needing the "INDEX =" hint.
Another issue affecting performance overall could be that the ORDER BY is requesting the Timestamp in DESC order and if you have the CLUSTERED index in ASC order then it would be the opposite direction of what you are looking for (at least in this query). Of course, in that case then it might be ok to have Timestamp in the NONCLUSTERED index if it was in DESC order.
My advice is to rethink the CLUSTERED index. Judging on just this query alone (other queries/uses might alter the recommendation), try dropping the NONCLUSTERED index and recreate the CLUSTERED index with the Timestamp field first, in DESC order, and also with the VarId so it can be delcared UNIQUE. So:
CREATE UNIQUE CLUSTERED INDEX [UIX_Archive_Timestamp_VarId]
ON Archive (Timestamp DESC, VarId ASC)
This, of course, assumes that the Timestamp and VarId combination is unique. If not, then still try this without the UNIQUE keyword.
Update:
To pull all of this info and advice together:
When designing indexes you need to consider the distribution of the data and the use-cases for interacting with it. More often than not there is A LOT to consider and several different approaches will appear good in theory. You need to try a few approaches, profile/test them, and see which works best in reality. There is no "always do this" approach without knowing all aspects of what you are doing and what else is going on and what else is planned to use and/or modify this table which I suspect has not been presented in the original question.
So to start the journey, you are ordering records by date and are looking at ranges of dates AND dates naturally occur in order so putting Timestamp first benefits more of what you are doing and has less fragmentation, especially if defined as DESC in the CREATE. Having an NC index on just VarId at that point will then be fine, even if spread out, for looking at a set of rows for a particular VarId. So maybe start there (change order of direction of CLUSTERED index and remove Timestamp from the NC index). See how those changes compare to the existing structure. Then try moving the VarId field into the CLUSTERED index and remove the NC index. You say that the combination is also not unique but does increase the predictability of the ordering of the rows. See how that works. Does this table ever get updated? If not and if the Value field along with Timestamp and VarId would be unique, then try adding that to the CLUSTERED index and be sure to create with the UNIQUE keyword. See how these different approaches work by looking at the Actual Execution Plan and use SET STATISTICS IO ON before running the query and see how the logical reads between the different approaches compare.
Hope this helps :)
Upvotes: 1
Reputation: 1828
There are different possibilities with this. I'll try help you to isolate them:
It could be SQL Server is favouring your Clustered Index (very likely it's the Primary Key) over your newly created index. One way to solve this is to have a NonClustered Primary Key and cluster the index on the other two fields (varid and timestamp). That is, if you don't want varid and timestamp to be the PK.
Also, looking at the (estimated) execution plan might help.
But I believe #1 only works nicely if those 2 fields are the most commonly used (queried) index. To find out if this is the case, it would be good to analyse which index users are most likely use (from http://sqlblog.com/blogs/louis_davidson/archive/2007/07/22/sys-dm-db-index-usage-stats.aspx):
select
ObjectName = object_schema_name(indexes.object_id) + '.' + object_name(indexes.object_id),
indexes.name,
case when is_unique = 1 then 'UNIQUE ' else '' end + indexes.type_desc,
ddius.user_seeks,
ddius.user_scans,
ddius.user_lookups,
ddius.user_updates
from
sys.indexes
left join sys.dm_db_index_usage_stats ddius on (
indexes.object_id = ddius.object_id
and indexes.index_id = ddius.index_id
and ddius.database_id = db_id()
)
WHERE
object_schema_name(indexes.object_id) != 'sys' -- exclude sys objects
AND object_name(indexes.object_id) LIKE 'Archive'
order by
ddius.user_seeks + ddius.user_scans + ddius.user_lookups
desc
Good luck
Upvotes: 3
Reputation: 3070
You might need to analyze your table to collect statistics, so the optimizer can determine whether to use the index or not.
Upvotes: 0