Jonathan Wood
Jonathan Wood

Reputation: 67195

Speeding ORDER BY clause with index

I have a query with an ORDER BY clause that is slow due to the table having over 11 million rows.

I can speed it up dramatically by adding a clustered index on the column in the ORDER BY clause. However, the software creates the query to order by different columns, depending on user settings. And you cannot add more than one clustered index to a table.

My question is: can non-clustered indexes be used to improve ORDER BY performance? Or is there something special about clustered indexes that means I will not be able to sort quickly for all columns?

Note: I've posted my real query and execution plan online but there are other issues that I don't want to go into here. I didn't create the database or write the query. And the query is still very slow even without the IN clause.

Upvotes: 4

Views: 24941

Answers (3)

Jiulin Teng
Jiulin Teng

Reputation: 364

I would guess that the best solution to this problem would be to

  1. Create a surrogate key as the cluster index
  2. Create composite indices for your queries

For example, you have

SELECT a,b,c
FROM tbl
WHERE x=?,y=?,z=?
ORDER BY j,k,l DESC

Then, you create composite index

INDEX xyz_jkl (x,y,z,j,k,l DESC)

This way, you optimize for each query.

The surrogate key is important for queries outside this table. Having it being an AUTO_INCREMENT field also makes INSERT faster.

Also keep in mind that the PRIMARY KEY (clustered index) is always included in the index.

Upvotes: 1

J Weezy
J Weezy

Reputation: 3945

Regarding @SeanLange comment on indexes being an art rather than a science, the best foo bar I have seen is where all columns of table were in the primary key. Further, if you are not careful and just create indexes based off every query execution plan, you will probably end up storing more data in indexes than what is on the actual table.

The idea here is to use covered queries. For your case, I've seen clustered indexes that are on an identity field where a non-clustered index contains the primary key (usually a composite primary key) that includes the clustered index column. From there, SELECT based on the primary key and order on the clustered index (its already sorted).

Update:

I just saw the query execution plan. You are getting hit with a table scan, which means that none of the columns in the WHERE clause are contained in either the primary key or an index. As far as the optimizer is concerned, the table is running in heap. Therefore, any index you add that contains (i.e., covers) the columns that are contained within the WHERE clause are likely to be used. As a result, the query will return much faster.

Ideally, you want to see Index Seeks followed by Index Scans. Usually, the optimizer will look for the unique identifier by its ordinal position in the index. What this means is that if the identity column is the first column listed in the index then you should be rewarded with an index seek. If the first column in the index is non-unique, then you will get an index scan. I would not say these are hard and fast rules, but that is my understanding based on the literature I have read and execution plans that I have seen.

Upvotes: 1

Brandon
Brandon

Reputation: 10028

Non-clustered indexes can absolutely be used to optimize away a sort. Indexes are essentially binary search trees, which means they contain the values sorted in order.

However, depending on the query, you may be putting SQL Server into a conundrum.

If you have a table with 100 million rows, your query will match 11 million of them, like below, is it cheaper to use an index on category to select the rows and sort the results by name, or to read all 100 million rows out of the index pre-sorted by name, and then filter down 89 million of them by checking the category?

select ...
from product
where category = ?
order by name;

In theory, SQL Server may be able to use an index on name to read rows in order and use the index on category to filter efficiently? I'm skeptical. I have rarely seen SQL Server use multiple indexes to access the same table in the same query (assuming a single table selection, ignoring joins or recursive CTE's). It would have to check the index 100 million times. Indexes have a high overhead cost per index search, so they are efficient when a single search narrows down the result set by a lot.

Without seeing a schema, statistics, and exact query, it's hard for me to say what makes sense, but I expect I would find SQL Server would use an index for the where clause and sort the results, ignoring an index on the sort column.

An index on the sort column may be used if you are selecting the entire table though. Like select ... from product order by name;

Again, your milage may vary. This is speculation based off past experience.

Upvotes: 11

Related Questions