Mohsin JK
Mohsin JK

Reputation: 571

SQL inner join vs subquery

I am working on following queries:

Query 1: SELECT * From TabA INNER JOIN TabB on TabA.Id=TabB.Id
Query 2: SELECT * From TabA WHERE Id in (SELECT Id FROM TabB)
Query 3: SELECT TabA.* From TabA INNER JOIN TabB on TabA.Id=TabB.Id

I investigate these queries with SQL Server profiler and found some interesting facts.

TabA 48716 rows

TabB 62719 rows

Basically what I am asking is why Query 1 is taking long time, not Query 3. I already know that 'sub query' is slower than inner join but here Query 2 is fastest; why?

Upvotes: 8

Views: 9120

Answers (4)

juanora
juanora

Reputation: 542

If the join is one to many, is possible that what is taking time is the repeated data. Instead, you can format a set of related rows as JSON array. Check the "Use Case 1" on https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/10/09/returning-child-rows-formatted-as-json-in-sql-server-queries/

Upvotes: 0

Kenneth Fisher
Kenneth Fisher

Reputation: 3812

If I had to guess I would say it's because query 1 is pulling the data from both tables. Queries 2 and 3 (aprox the same time) are only pulling data for TabA.

One way you could check this is by running the following:

SET STATISTICS TIME ON
SET STATISTICS IO ON

When I ran

SELECT * FROM sys.objects

I saw the following results.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 104 ms.

(242 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syssingleobjrefs'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syspalnames'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 866 ms.

You can take a look at # of scans, logical reads and physical reads for each query. Physical reads of course take much longer and represent reading from the disk into the cache. If all of your reads are logical reads then your table is completely in cache.

I would be willing to bet if you look you will see a lot more logical reads on TabB on query 1 than on 2 and 3.

EDIT:

Just out of curiosity I did some tests and blogged the results here.

Upvotes: 4

Eric J. Price
Eric J. Price

Reputation: 2785

Query 1:
This query is returning rows from all rows in both TabA and TabB so the covering index for both tables requires all rows from each table to be included. To see exactly what's going on you'd want to look at the query plan.

Query 2 & Query 3:
You're returning all rows from TabA and you only need the index for the Id column for TabB. I'm guessing the difference here has something to do with the table statistics, but (once again) we'd have to see the query plan to know exactly what's going on.

Upvotes: 2

Captain Kenpachi
Captain Kenpachi

Reputation: 7215

It's simply because SQL doesn't have to perform a JOIN. You are just performing two queries, and only one of them has a WHERE clause.

I must admit I didn't expect SUCH a big difference.

Upvotes: 1

Related Questions