Reputation: 571
I am working on following queries:
Query 1: SELECT * From TabA INNER JOIN TabB on TabA.Id=TabB.Id
Query 2: SELECT * From TabA WHERE Id in (SELECT Id FROM TabB)
Query 3: SELECT TabA.* From TabA INNER JOIN TabB on TabA.Id=TabB.Id
I investigate these queries with SQL Server profiler and found some interesting facts.
TabA 48716 rows
TabB 62719 rows
Basically what I am asking is why Query 1 is taking long time, not Query 3. I already know that 'sub query' is slower than inner join but here Query 2 is fastest; why?
Upvotes: 8
Views: 9120
Reputation: 542
If the join is one to many, is possible that what is taking time is the repeated data. Instead, you can format a set of related rows as JSON array. Check the "Use Case 1" on https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/10/09/returning-child-rows-formatted-as-json-in-sql-server-queries/
Upvotes: 0
Reputation: 3812
If I had to guess I would say it's because query 1 is pulling the data from both tables. Queries 2 and 3 (aprox the same time) are only pulling data for TabA.
One way you could check this is by running the following:
SET STATISTICS TIME ON
SET STATISTICS IO ON
When I ran
SELECT * FROM sys.objects
I saw the following results.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 104 ms.
(242 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syssingleobjrefs'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syspalnames'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 866 ms.
You can take a look at # of scans, logical reads and physical reads for each query. Physical reads of course take much longer and represent reading from the disk into the cache. If all of your reads are logical reads then your table is completely in cache.
I would be willing to bet if you look you will see a lot more logical reads on TabB on query 1 than on 2 and 3.
EDIT:
Just out of curiosity I did some tests and blogged the results here.
Upvotes: 4
Reputation: 2785
Query 1:
This query is returning rows from all rows in both TabA and TabB so the covering index for both tables requires all rows from each table to be included. To see exactly what's going on you'd want to look at the query plan.
Query 2 & Query 3:
You're returning all rows from TabA and you only need the index for the Id column for TabB. I'm guessing the difference here has something to do with the table statistics, but (once again) we'd have to see the query plan to know exactly what's going on.
Upvotes: 2
Reputation: 7215
It's simply because SQL doesn't have to perform a JOIN. You are just performing two queries, and only one of them has a WHERE clause.
I must admit I didn't expect SUCH a big difference.
Upvotes: 1