Reputation: 3104
I have this query, actually it is similar to this, this is for testing purpose:
SELECT
[Distinct1].[ProspectID] AS [ProspectID]
FROM ( SELECT DISTINCT
[Limit1].[ProspectID] AS [ProspectID]
FROM ( SELECT TOP 10
[Extent1].[ProspectID] AS [ProspectID]
FROM [sqlsolvent].[crm_SearchTable] AS [Extent1]
WHERE ([Extent1].[CP_Name] LIKE '%dsd%' ESCAPE N'~')
) AS [Limit1]
) AS [Distinct1]
This is the execution plan for it:
And when I remove TOP 10
from query I get this execution plan:
The second execution plan without TOP 10
is 2 times faster, can someone explain why TOP changes execution plan so much and why it so slower? It is the same even if query doesn't return any result, shouldn't top just be aplied to the result set, why does it have so much impact on performance when query returns nothing?
Upvotes: 4
Views: 98
Reputation: 688
The second query is split into parallel streams which, depending on machine configuration, will increase performance. The TOP operator won't do this as it is trying to get as many rows as it needs until the TOP condition is satisfied, not all rows.
If your where clause is strict, the table scan needs to scan more of the table until the TOP condition is met. If your where clause is not strict (e.g. it matches a large percentage of the total rows) then the query with TOP may be quicker than the query without TOP.
Upvotes: 2
Reputation: 63732
Note that there's an error in your query - top
must be used along with some explicit ordering, otherwise the ordering of the result is undefined, and as such, the results of the top
itself are undefined.
I can't tell you with certainty what's happening, but imposing explicit ordering might make more sense of your results - either it will allow for a more reasonable execution plan, or it will make your "top-less" query slower.
In any case, you are imposing an ordering and a limit before doing the distinct. While your top-less query can use indices to do the distinct
(on the whole table), those are no longer available when you use top
, especially since your filter doesn't allow the use of any index (like '%whatever%'
is a great way to kill performance :)). Since the column you're distinct
ing by is a clustered index, the distinct
is actually very cheap - as long as you can use that index.
All in all, I'd have a look at how the two queries perform with more data. The slowdown might very well turn into a speedup with a large amount of realistic data - it's very hard to guess around these things :)
Upvotes: 2
Reputation: 172408
You can refer the support article from Microsoft which provides the workaround
Workaround
To work around this issue, craft a query that collects the TOP N elements of each partition. Then, find the TOP N elements from that collection of elements.
You can refer this article to get the details:
The Top operator, when called by the SELECT operator, calls the Nested Loops operator to get a row; returns it to the SELECT operator; and then sits waiting until it is called again – which normally will happen fairly quickly, unless the SELECT operator has to wait for the application or network to send out the row. The same thing happens two more times, but when the SELECT operator calls the Top operator for the fourth time, the Top operator will immediately return an “end of data” condition without ever bothering to call its descendant Nested Loops node. No matter how many rows the SalesOrderHeader table has, the TOP clause guarantees that there will never be more than three lookups. The Sort operator we had in the first plan was cheaper than 31,465 lookups, but is far more expensive than just three lookups.
Upvotes: 1