Rewriting sql query with a nested subquery

Question

Original Query I'm trying to rewrite:

SELECT Table1.* 
FROM Table1
INNER JOIN Table2 ON Table2.[IDENTITY]=Table1.ID
WHERE Table2.Field1 = @value AND Table2.Field2 = '1' AND Table1.ID in
(
   select Table1.ID from Table1 where Table1.Number in
   (select Table1.Number from Table1 where Table1.ID=@ID) 
)

Note: In real query, I list all the columns instead of using Table1.*

This query is slightly confusing, especially since I changed the names etc for posting. In short it needs to take an ID that's passed in and find all the Table1.Number fields that have that ID. There is a Many to 1 relationship between Number and ID. So then once all the Numbers are found I need to then find the total list of IDs that use any of those numbers.

When I look at the statistics when I run the query I get

Table 'Table1'. Scan count 3873, logical reads 135255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table2'. Scan count 0, logical reads 7995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I feel the culprit is the nested subquery. I've been trying to write this a different way for a while now, but I just can't seem to figure it out completely. I rewrote the 2nd nested query as:

WHERE Table2.Field1 = @value AND Table2.Field2 = '1' AND Table1.ID in
(
   select Table1.ID from Table1
   INNER JOIN Table1 AS Table3 ON Table1.Number = Table3.Number 
   where Table3.MessageID=@ID
)

Unfortunately this resulted in identical statistics. I can't quite figure out how to remove the 2nd "in" statement.

Is this the best approach? Is there a better one? Am I right that this kind of subquery is very bad performance wise and thus causing the high scans and logical reads the IO statistics are showing me?

EDIT: Original Query used Table2.[IDENTITY]=Table1.MessageID. That should be Table2.[IDENTITY]=Table1.ID. I've updated the above query to reflect this.

Carth · Accepted Answer

I believe you're correct in that the nested subquery is causing the statistics that you're seeing. When you have a value that is translating into a set like this I often find that the best performing solution is to shove the results of the subquery into a temp table and then join against that instead. This prevents the row-based execution of the subquery and should dramatically improve your performance.

The contents of the subquery reference a static statement variable but do not reference any elements from the outer statement. This means that it is not a correlated subquery and that we're simply performing the same operation over and over for every row. When you see a subquery like this it's an easy optimization choice to move the operation outside of the select and reference the data in a more appropriate way.

For the example you provided you would do something like

select distinct Table1.ID 
INTO #myTemp
from Table1 where Table1.Number in
(select Table1.Number from Table1 where Table1.ID=@ID)

This would turn your initial query into

SELECT Table1.* 
FROM Table1
INNER JOIN Table2 ON Table2.[IDENTITY]=Table1.MessageID
INNER JOIN #myTemp a on Table1.ID = a.ID
WHERE Table2.Field1 = @value AND Table2.Field2 = '1'

I'm assuming you're using SQL2008 R2 so your syntax may differ depending on your RDBMS.

Rewriting sql query with a nested subquery

Answers (1)

Related Questions