Steam
Steam

Reputation: 9856

Which is better - subqueries only or joins?

I have two tables in SQL Server 2008 -

Sales.SalesOrderHeader --> CustomerID(FK, int, not null), OrderDate
(datetime, not null), etc...
Sales.Individual --> CustomerID(PK, FK, int, not null), ContactID
(FK, int, not null), etc...

I have to find the the customers (ie CustomerID and the Corresponding ContactID) that ordered something on the last day orders were placed.

This query is used to find the last OrderDate

   select MAX(Soh.OrderDate) 
   from Sales.SalesOrderHeader as Soh

Now, the next thing to do is get the CustomerID and Contact ID. I thought of two ways - Using only sub queries and where clauses OR Join and one sub query. The two approaches are shown below:

--Style1: Using only subquery

select Si.CustomerID, Si.ContactID
from Sales.Individual as Si
where Si.CustomerID in
( 
  select Soh.CustomerID
  from Sales.SalesOrderHeader as Soh
  where Soh.OrderDate = 
  (
    select MAX(Soh.OrderDate) 
    from Sales.SalesOrderHeader as Soh
  )
) 
order by Si.CustomerID, Si.ContactID

--Style 2: Using inner join

select CustOnLastDay.CustomerID, Si.ContactID
from
(
  select Soh.CustomerID, Soh.ContactID
  from Sales.SalesOrderHeader as Soh
  where Soh.OrderDate = 
  (
    select MAX(Soh.OrderDate) 
    from Sales.SalesOrderHeader as Soh
  )
) as CustOnLastDay
inner join Sales.Individual as Si
on CustOnLastDay.ContactID = Si.ContactID
order by Si.CustomerID, Si.ContactID

Question - Which is better, sub queries only or joins (In general and this case) ?
Btw, most of my tables don't have more than 14-15k rows.

Thanks.

Upvotes: 2

Views: 653

Answers (4)

Rahul Tripathi
Rahul Tripathi

Reputation: 172378

In JOINs RDBMS can create an execution plan which makes it faster as compared to sub query. In many cases you will find that JOINS are comparatively faster than subquery. However when they are functionaly equivalent, they will perform the same. Subquery loads all the data for processing

MSDN says:-

Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. Otherwise, the nested query must be processed for each result of the outer query to ensure elimination of duplicates. In such cases, a join approach would yield better results.

For example:-

If you are doing something like this:-

select * from table1 where exists select * from table2 where table2.parent=table1.id

then it is better to use JOIN

Check this Example which explains the difference between SUBQUERY and JOIN performance:-

USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of in
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- Use of Join
SELECT *
FROM HumanResources.Employee E
INNER JOIN HumanResources.EmployeeAddress EA ON E.EmployeeID = EA.EmployeeID
GO

Now compare the execution plan:-

enter image description here

Upvotes: 4

Matt Stephenson
Matt Stephenson

Reputation: 8610

Neither! Look into the ranking functions for some inspiration. ANSI SQL calls these window functions. You may find DENSE_RANK() is exactly what you need to grab the latest date in an elegant way:

select *
from (
    select Si.CustomerID, Si.ContactID,
    DENSE_RANK() OVER(ORDER BY soh.OrderDate DESC) as DenseRank 
    from Sales.Individual Si
    inner join SalesOrderHeader soh on soh.CustomerId = Si.CustomerId
) subquery
where subquery.DenseRank = 1 

Upvotes: 0

DRapp
DRapp

Reputation: 48139

Inner joins are typically faster than subqueries as the subqueries are typically executed each time for whatever the current record is in the first part of the from clause...

Also, you could do better by having the MAX() order date as it's own query so it's not done on every record and join down stream. Ensure you have index on your order header table. I would have it based on (orderdate,customerid) so it's a covering index and doesn't need to go back to the raw data pages for other criteria as the order date and customer ID are in the index that it can utilize.

I would revise the query as...

select
      soh2.CustomerID,
      si.ContactID
   from
      ( select max( soh.orderdate ) MaxDate
           from sales.salesorderheader soh ) as JustDate
         join sales.salesorderheader soh2
            on JustDate.MaxDate = soh2.OrderDate
            join Sales.Individual SI
               on soh2.CustomerID = SI.CustomerID
   order by
      soh2.CustomerID, 
      si.ContactID

Upvotes: 0

user353gre3
user353gre3

Reputation: 2755

Join is generally preferred to the multi-level nesting with subqueries. In general joins are faster as SQL server Engine can better optimize this type of query.

Upvotes: 0

Related Questions