Reputation: 12661
I'm working with a productiondatabase and I need to rearrange my resources into multiple CTE's in order to get the correct formats to match my data.
This results in a CTE1
with roughly 7000 records (takes few seconds) and a CTE2
with roughly 55000 records (takes less than a second).
Next step is to join both CTE's based on two columns. This step took 19 minutes with CTE2 limited on a TOP 1000
! Since these are CTE's, there is not a single index. My question is how can I make this match within a decent time (even a minute or two would suffice).
CTE1 consists of four alphanumeric columns with a maximum of 8 characters.
CTE2 consists of three alphanumeric columns with a maximum of 8 characters and two datetimes.
The inner join
matches on 2 alphanumeric columns from CTE1
with 2 alphanumeric columns from CTE2
.
Example CTE1 (7000 records)
A | B | C | D
---------------------------------------------------------------------------
ABC | 123 | 1234567 | 12345678
A
has two distinct values, matching distinct values from CTE2.A
.B
has about 6500 distincts. Not all values are represented in CTE2.B
.Example CTE2 (55000 records)
A | B | C | D | E
---------------------------------------------------------------------------
ABC | 123 | XYZ | 2013-10-11 15:00:00.000 | 2013-10-11 16:00:00.000
A
has two distinct values, matching distinct values from CTE1.A
.B
has about 2000 distincts. Not all values (though most) are respresented in CTE1.B
.Query (TOP 1000 of CTE2 takes 19 minutes with this query)
SELECT CTE1.C
,CTE1.D
,CTE2.B
,CTE2.C
,CTE2.D
,CTE2.E
FROM CTE1
INNER JOIN CTE2 ON CTE1.A = CTE2.A
AND CTE1.B = CTE2.B
Upvotes: 3
Views: 2010
Reputation: 69819
The problem is here that although your CTEs both perform well on their own, when you join them you are not just saying join the results of CTE1 to the results of CTE2, you are joining the two query definitions, so each is not executed in their own right before being joined together, SQL Server will find what it thinks is the optimal way to join these two queries together, and sometimes, not very often it results in an unusual and not optimal execution plan.
There is a connect item requesting something equivalent to the NOEXPAND
hint for CTEs, so you tell the optimiser not to expand the definitions. If you don't want to expand the definition out though, it seems to me the same as using temporary tables (apart from you can't use temporary tables in a view), so I think this could be your best work around, instead of using two CTEs, use two temporary tables and join these together.
Alternatively you can carefully examine the execution plans of the two CTEs on their own, and compare to the execution plan of the two combined, and find out where all the additional cost comes from, what indexes are no longer being used etc, but without your exeution plans I can't even begin to guess at what the problem is.
Upvotes: 1
Reputation: 1271151
This is expanding on Gareth's answer.
SQL Server does not execute CTEs separately and then combine them in the final query. This is a known issue, and there is a request for a query hint that would force materialization of CTEs. You can vote on the request here.
Instead, it takes the CTEs as code, stuffs them into the query, and then "optimizes" the whole thing. With long, complex queries, it might make a mistake.
The most common mistake that affects performance is to change a join to a nested loop join where there is no index defined. You can see if this is happening by looking at the execution plan. If it contains nested loops, then you might be in trouble.
If so, try running the query with OPTION (HASH JOIN, MERGE JOIN)
to avoid nested loop joins.
Upvotes: 4
Reputation: 15433
Having TOP n
without an ORDER BY
on non indexed data might be an issue here.
You may also try to "force" priority of the join on column B with something like
SELECT CTE1.C
,CTE1.D
,CTE2.B
,CTE2.C
,CTE2.D
,CTE2.E
FROM CTE1
LEFT OUTER JOIN CTE2 ON CTE1.B = CTE2.B
WHERE CTE1.A = CTE2.A
AND NOT CTE2.B IS NULL
Upvotes: 0