Reputation: 521
was wondering why this query performs slowly. If anyone could walk me through how its processed that would be great. The DB being queried has over 500 million rows. Is this query really that poorly written that a TOP 10 takes so long to complete it may as well never finish? How might I improve the query assuming I still want to query data by month+year?
SELECT TOP 10 *
FROM ADB.dbo.Stuff tt
WHERE MONTH(tt.SomeDate) = 5
AND
YEAR(tt.SomeDate) = 2011
Does SELECT TOP 10 not just halt after 10 results have been acquired? Or does it take so long because it hasn't found my conditions yet while going through the 500m+ rows?
Thanks and sorry for such a simple question.
Upvotes: 0
Views: 1109
Reputation: 280615
It has to scan the entire table because MONTH(column)
and YEAR(column)
are not sargable, and you haven't told SQL Server what you mean by TOP
. While it's true that SQL Server may be able to short circuit onc it's found your 10 rows, it may be so far into the scan when that happens that the difference to you is minimal. This is especially true if you find zero rows or < 10 rows that match your where clause.
A much better WHERE
clause would be:
WHERE SomeDate >= '20110501' AND SomeDate < '20110601';
If you don't want to construct the strings, you can pass those in as parameters / variables and do this:
DECLARE @year INT;
DECLARE @month INT;
SET @year = 2011;
SET @month = 5;
...
WHERE SomeDate >= DATEADD(MONTH, @month-1, DATEADD(YEAR, @year-1900, '19000101'))
AND SomeDate < DATEADD(MONTH, @month, DATEADD(YEAR, @year-1900, '19000101'));
In either case, if there is an index on SomeDate
, it can be used and a table scan can be avoided. You want to avoid a table scan on a table with 500 million rows, even if you're only looking for 10 rows, and even if short circuiting might happen.
Even without a table scan, however, this query is still going to be inefficient. Do you really need all of the columns? If an index on SomeDate
is used the seek will still have to do a lookup into the clustered index or a covering index to retrieve the rest of the columns. If you don't need those columns, don't include them.
And as bluefeet pointed out, this TOP 10
stuff makes no sense if you haven't told SQL Server which 10 you mean, and you do that using ORDER BY
. If the ORDER BY
uses a suitable index you may avoid the additional costly sort operator you might think you're avoiding by not using ORDER BY
anyway.
Upvotes: 7