Reputation: 259
Original Query I'm trying to rewrite:
SELECT Table1.*
FROM Table1
INNER JOIN Table2 ON Table2.[IDENTITY]=Table1.ID
WHERE Table2.Field1 = @value AND Table2.Field2 = '1' AND Table1.ID in
(
select Table1.ID from Table1 where Table1.Number in
(select Table1.Number from Table1 where Table1.ID=@ID)
)
Note: In real query, I list all the columns instead of using Table1.*
This query is slightly confusing, especially since I changed the names etc for posting. In short it needs to take an ID that's passed in and find all the Table1.Number fields that have that ID. There is a Many to 1 relationship between Number and ID. So then once all the Numbers are found I need to then find the total list of IDs that use any of those numbers.
When I look at the statistics when I run the query I get
Table 'Table1'. Scan count 3873, logical reads 135255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table2'. Scan count 0, logical reads 7995, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
I feel the culprit is the nested subquery. I've been trying to write this a different way for a while now, but I just can't seem to figure it out completely. I rewrote the 2nd nested query as:
WHERE Table2.Field1 = @value AND Table2.Field2 = '1' AND Table1.ID in
(
select Table1.ID from Table1
INNER JOIN Table1 AS Table3 ON Table1.Number = Table3.Number
where Table3.MessageID=@ID
)
Unfortunately this resulted in identical statistics. I can't quite figure out how to remove the 2nd "in" statement.
Is this the best approach? Is there a better one? Am I right that this kind of subquery is very bad performance wise and thus causing the high scans and logical reads the IO statistics are showing me?
EDIT: Original Query used Table2.[IDENTITY]=Table1.MessageID. That should be Table2.[IDENTITY]=Table1.ID. I've updated the above query to reflect this.
Upvotes: 1
Views: 237
Reputation: 2343
I believe you're correct in that the nested subquery is causing the statistics that you're seeing. When you have a value that is translating into a set like this I often find that the best performing solution is to shove the results of the subquery into a temp table and then join against that instead. This prevents the row-based execution of the subquery and should dramatically improve your performance.
The contents of the subquery reference a static statement variable but do not reference any elements from the outer statement. This means that it is not a correlated subquery and that we're simply performing the same operation over and over for every row. When you see a subquery like this it's an easy optimization choice to move the operation outside of the select and reference the data in a more appropriate way.
For the example you provided you would do something like
select distinct Table1.ID
INTO #myTemp
from Table1 where Table1.Number in
(select Table1.Number from Table1 where Table1.ID=@ID)
This would turn your initial query into
SELECT Table1.*
FROM Table1
INNER JOIN Table2 ON Table2.[IDENTITY]=Table1.MessageID
INNER JOIN #myTemp a on Table1.ID = a.ID
WHERE Table2.Field1 = @value AND Table2.Field2 = '1'
I'm assuming you're using SQL2008 R2 so your syntax may differ depending on your RDBMS.
Upvotes: 2