Reputation: 2434
I have a query that cross joins two tables. TABLE_1
has 15,000 rows and TABLE_2
has 50,000 rows. A query very similar to this one has run in the past in roughly 10 minutes. Now it is running indefinitely with the same server situation (i.e. nothing else running), and the very similar query is also running indefinitely.
SELECT A.KEY_1
,A.FULL_TEXT_1
,B.FULL_TEXT_2
,B.KEY_2
,MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0) AS confidence
FROM #TABLE_1 A
CROSS JOIN #TABLE_2 B
WHERE MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0) >= 0.9
When I run the estimated execution plan for this query, the Nested Loops (Inner Join)
node is estimated at 96% of the execution. The estimated number of rows is 218 million, even though cross joining the tables should result in 15,000 * 50,000 = 750 million rows. When I add INSERT INTO #temp_table
to the beginning of the query, the estimated execution plan puts Insert Into
at 97% and estimates the number of rows as 218 million. In reality, there should be less than 100 matches that have a similarity score above 0.9.
I have read that large differences in estimated vs. actual row counts can impact performance. What could I do to test/fix this?
Upvotes: 0
Views: 119
Reputation: 16068
For better performance, use the minScoreHint parameter. This allows to prevent doing the full similarity calculation for many pairs and early exit.
So this should run quicker:
SELECT A.KEY_1
,A.FULL_TEXT_1
,B.FULL_TEXT_2
,B.KEY_2
,MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0, 0.9) AS confidence
FROM #TABLE_1 A
CROSS JOIN #TABLE_2 B
WHERE MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0, 0.9) >= 0.9
It is not clear from docs if 0.9 results would be included. If not, change 0.9 to 0.89
Upvotes: 1
Reputation: 176
The link provided by scsimon will help you prove whether it's statistics or not. Have the estimates changed significantly since to when it was running fast?
Parallelism spring to mind. If the query was going parallel, but now isn't (e.g. if a server setting has been changed, or statistics) then that could cause significant performance degradation.
Upvotes: 0
Reputation: 1269763
I have read that large differences in estimated vs. actual row counts can impact performance. What could I do to test/fix this?
Yes, this is true. It particularly affects optimizations involving join algorithms, aggregation algorithms, and indexes.
But it is not true for your query. Your query has to do a nested loops join with no indexes. All pairs of values in the two tables need to be compared. There is little algorithmic flexibility and (standard) indexes cannot really help.
Upvotes: 1