RMorrisey
RMorrisey

Reputation: 7739

How can I optimize a join with multi-table sort in T-SQL?

How can I optimize the following query?

   SELECT TOP 50 *
     FROM A 
LEFT JOIN B ON A.b_id = B.id 
 ORDER BY A.number, B.name DESC

I created a non-clustered index on (A.number asc, A.creation_date desc), which includes all columns from A, and another non-clustered index on B.origination_date desc, which includes all columns from B (except text columns). Neither of these indices are used, according to the actual execution plan from SQL Server Management Studio.

The thing that seems to be causing the performance hit is the B.origination_date sort. When I examine the actual execution plan in SQL Server Management Studio, I see that "Top N Sort" on these three fields takes up 91% of the execution time. If I drop off the sort on B.origination_date, the query completes almost instantaneously, using the index on A.

Edit: Updated the query to provide a better, simpler example.

Upvotes: 4

Views: 2642

Answers (3)

Ted Elliott
Ted Elliott

Reputation: 3493

Since you're sorting on columns from two different tables, SQL Server has to join the tables and then do the sort. Once the tables are joined, the indexes on the individual tables are no help to the sort. An indexed view might be your best bet.

Upvotes: 2

Philip Kelley
Philip Kelley

Reputation: 40309

Without hands-on access, it’s hard to come up with hard and fast solutions. Some ideas and suggestions:

Without the join on table B, all SQL has to do (with the index on A.Number) is walk through until it finds the first 50 rows that match your pattern. If the values of “Number” are relatively unique (not many duplicates [this is cardinality]), there’s little value in having Creation_Date in the index as well.

Why the left outer join into B? Is it one to [zero or one], or one to [zero or many]? If the cardinality is low (many duplicates in A) then the join is required to clearly find the “first 50”, otherwise one would think the join wouldn’t impact performance beyond the need to perform the join). I can’t see any index on B (besides on column id) making any difference here. Um, you do have an index on B.Id, right? If not, that could slow things down tremendously (presuming that B has a significant number of rows, of course).

For more sepcifics, I’d want to review the cardinality of the join and order by columns, and look very closely at the execution plan of the “with join” query.


Addenda

If A has low cardinality (many duplicates), then the query optimizer may "think" that it will have to use a lot B.Id to resolve ordering (which must be done to find the Top 50). This might explain why it does what it does.

If they will produce 100% equivalent results, I would recommend replacing the LEFT join with an INNER join. In general, query plans can become much simpler when more restrictive join conditions are in place.

Upvotes: 1

HLGEM
HLGEM

Reputation: 96552

I would guess A.number like '%%' is your problem. What is this intended to do? You should not be using a like with a wildcard as the first character if you want to use the indexes. As this stands it appears to be filtering for nothing as tere is nothing between the wildcards.

Upvotes: 5

Related Questions