Razzle Dazzle
Razzle Dazzle

Reputation: 521

SQL query performance/SELECT TOP X behavior

was wondering why this query performs slowly. If anyone could walk me through how its processed that would be great. The DB being queried has over 500 million rows. Is this query really that poorly written that a TOP 10 takes so long to complete it may as well never finish? How might I improve the query assuming I still want to query data by month+year?

 SELECT TOP 10 *
 FROM ADB.dbo.Stuff tt
 WHERE MONTH(tt.SomeDate) = 5
 AND
 YEAR(tt.SomeDate) = 2011

Does SELECT TOP 10 not just halt after 10 results have been acquired? Or does it take so long because it hasn't found my conditions yet while going through the 500m+ rows?

Thanks and sorry for such a simple question.

Upvotes: 0

Views: 1109

Answers (1)

Aaron Bertrand
Aaron Bertrand

Reputation: 280615

It has to scan the entire table because MONTH(column) and YEAR(column) are not sargable, and you haven't told SQL Server what you mean by TOP. While it's true that SQL Server may be able to short circuit onc it's found your 10 rows, it may be so far into the scan when that happens that the difference to you is minimal. This is especially true if you find zero rows or < 10 rows that match your where clause.

A much better WHERE clause would be:

WHERE SomeDate >= '20110501' AND SomeDate < '20110601';

If you don't want to construct the strings, you can pass those in as parameters / variables and do this:

DECLARE @year INT;
DECLARE @month INT;

SET @year = 2011;
SET @month = 5;

...
WHERE SomeDate >= DATEADD(MONTH, @month-1, DATEADD(YEAR, @year-1900, '19000101'))
  AND SomeDate <  DATEADD(MONTH, @month,   DATEADD(YEAR, @year-1900, '19000101'));

In either case, if there is an index on SomeDate, it can be used and a table scan can be avoided. You want to avoid a table scan on a table with 500 million rows, even if you're only looking for 10 rows, and even if short circuiting might happen.

Even without a table scan, however, this query is still going to be inefficient. Do you really need all of the columns? If an index on SomeDate is used the seek will still have to do a lookup into the clustered index or a covering index to retrieve the rest of the columns. If you don't need those columns, don't include them.

And as bluefeet pointed out, this TOP 10 stuff makes no sense if you haven't told SQL Server which 10 you mean, and you do that using ORDER BY. If the ORDER BY uses a suitable index you may avoid the additional costly sort operator you might think you're avoiding by not using ORDER BY anyway.

Upvotes: 7

Related Questions