orBeat
orBeat

Reputation: 125

SQL adding Order By clause causes the query to run significantly faster. Explanation needed

So i have this query:

SELECT * 
FROM ViewTechnicianStatus
WHERE NotificationClass = 2 
    AND MachineID IN (SELECT ID FROM MachinesTable WHERE DepartmentID = 1 AND IsMachineActive <> 0)
--ORDER BY ResponseDate DESC

The view is huge and complex with a lot of joins and subqueries. When i run this query it takes forever to finish, however if i add the ORDER BY it finishes instantly and returns 20 rows as intended. I don't understand how adding the ORDER BY could have such a huge positive impact on the performance. Would love if somebody could explain to me the phenomenon.

EDIT: Here is the rundown with SET STATISTICS TIME, IO ON; flags on. Sorry for the hidden table names but i don't think i can expose those.

Without ORDER BY Without ORDER BY

With Order BY enter image description here

Upvotes: 3

Views: 1310

Answers (2)

seanb
seanb

Reputation: 6685

Summary

OK.. I've thought about this for a while as I think it's an interesting issue. I believe it's very much an edge case - which is part of what makes it interesting.

I'm taking an educated guess based on the info provided - obviously, without being able to see it/play with it, I cannot be certain. But I think this explanation matches the evidence based on the info you provide and the statistics.

I think the main issue is a poor query plan. In the version without the sort, it uses an inappropriate nested loop; in the version with the sort, it does (say) a hash match or merge join.

I have found that SQL Server often has issues with query plans within complex views that reference other views, and especially if those sub-views have group bys/sorts/etc.

For demonstration of what difference that could occur, I'll simplify your complex view into 2 subgroups I'll call 'view groups' (which may be one or several views and tables - not a technical term, just a term to summarise them).

  • The first view group contains most tables,
  • The second view group contains the views getting data from tables 6 and 7.

For both approaches, how SQL uses the data in the view groups are probably the same (e.g., use the same indexes, etc). However, there's a difference in its approach to how it does the join between the two view groups.

Example - query planner underestimates cost of view group 2 and doesn't care which method is used

I'm guessing

  • The first view group, at the point of the join, is dealing with about 3000 rows (it hasn't filtered it down yet), and
  • The query builder thinks view group 2 is easy to run

In the version without the order by, the query plan is designed with a nested loop join. That is, it gets each value in view group 1, and then for each value it runs view group 2 to get the relevant data. This means the view group 2 is run 3000-ish times (once for each value in view group 1).

In the version with the order by, it decides to do (say) a hash match between view group 1 and view group 2. This means it has to only run view group 2 once, but spend a bit more time sorting it. However, because you asked for it to be sorted anyway, it chooses the hash match.

However, because the query designed underestimated the cost of view group 2, it turns out that the hash match is a much much better query plan for the circumstances.

Example - query planner use of cached plans

I believe (but may be wrong!) that when you reference views within views, it can often just used cached plans for the sub-views rather than trying to get the best plan possible for your current situation.

It may be that one of your views uses the "cached plan" whereas the other one tries to optimise the query plan including the sub-views.

Ironically, it may be that the query version with the order by is more complex, and in this case it uses the cached plans for view group 2. However, as it knows it hasn't optimised the plan for view group 2, it simply gets the data once for view group 2, then keeps all results in memory and uses it in a hash match.

In contrast, in the version without the order by, it takes a shot at optimising the query plan (including optimising how it uses the views), and makes a mess of it.

Possible solutions

These are all possibilities - they may make it better or may make it worse! Note that SQL is a declarative language (you tell the computer what to do/what you want, but not how to do it).

This is not a comprehensive list of possibilities, but they are things you can try

  • Pre-calculate all or part(s) of the views (e.g., put the pre-calculated stuff from tables 6 and 7 into a temporary table, then use the temporary tables in the views)
  • Simplify the SQL and/or move all the SQL into a single view that doesn't call other views
  • Use join hints e.g., instead of INNER JOIN, use INNER HASH JOIN in the appropriate position
  • Use OPTION(RECOMPILE)

Upvotes: 0

Ankit Das
Ankit Das

Reputation: 650

To answer your question, The reason that your query runs faster when adding order by is due to the INDEXING. Probably in all the Clients that you tested, had Indexing for those specific fields/tables, and on using the Order by made the performance better.

Upvotes: 1

Related Questions