Reputation: 9856
I have two tables in SQL Server 2008 -
Sales.SalesOrderHeader --> CustomerID(FK, int, not null), OrderDate
(datetime, not null), etc...
Sales.Individual --> CustomerID(PK, FK, int, not null), ContactID
(FK, int, not null), etc...
I have to find the the customers (ie CustomerID and the Corresponding ContactID) that ordered something on the last day orders were placed.
This query is used to find the last OrderDate
select MAX(Soh.OrderDate)
from Sales.SalesOrderHeader as Soh
Now, the next thing to do is get the CustomerID and Contact ID. I thought of two ways - Using only sub queries and where clauses OR Join and one sub query. The two approaches are shown below:
--Style1: Using only subquery
select Si.CustomerID, Si.ContactID
from Sales.Individual as Si
where Si.CustomerID in
(
select Soh.CustomerID
from Sales.SalesOrderHeader as Soh
where Soh.OrderDate =
(
select MAX(Soh.OrderDate)
from Sales.SalesOrderHeader as Soh
)
)
order by Si.CustomerID, Si.ContactID
--Style 2: Using inner join
select CustOnLastDay.CustomerID, Si.ContactID
from
(
select Soh.CustomerID, Soh.ContactID
from Sales.SalesOrderHeader as Soh
where Soh.OrderDate =
(
select MAX(Soh.OrderDate)
from Sales.SalesOrderHeader as Soh
)
) as CustOnLastDay
inner join Sales.Individual as Si
on CustOnLastDay.ContactID = Si.ContactID
order by Si.CustomerID, Si.ContactID
Question - Which is better, sub queries only or joins (In general and this case) ?
Btw, most of my tables don't have more than 14-15k rows.
Thanks.
Upvotes: 2
Views: 653
Reputation: 172378
In JOINs RDBMS can create an execution plan which makes it faster as compared to sub query. In many cases you will find that JOINS are comparatively faster than subquery. However when they are functionaly equivalent, they will perform the same. Subquery loads all the data for processing
MSDN says:-
Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. Otherwise, the nested query must be processed for each result of the outer query to ensure elimination of duplicates. In such cases, a join approach would yield better results.
For example:-
If you are doing something like this:-
select * from table1 where exists select * from table2 where table2.parent=table1.id
then it is better to use JOIN
Check this Example which explains the difference between SUBQUERY and JOIN performance:-
USE AdventureWorks
GO
-- use of =
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID = ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of in
SELECT *
FROM HumanResources.Employee E
WHERE E.EmployeeID IN ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- use of exists
SELECT *
FROM HumanResources.Employee E
WHERE EXISTS ( SELECT EA.EmployeeID
FROM HumanResources.EmployeeAddress EA
WHERE EA.EmployeeID = E.EmployeeID)
GO
-- Use of Join
SELECT *
FROM HumanResources.Employee E
INNER JOIN HumanResources.EmployeeAddress EA ON E.EmployeeID = EA.EmployeeID
GO
Now compare the execution plan:-
Upvotes: 4
Reputation: 8610
Neither! Look into the ranking functions for some inspiration. ANSI SQL calls these window functions. You may find DENSE_RANK() is exactly what you need to grab the latest date in an elegant way:
select *
from (
select Si.CustomerID, Si.ContactID,
DENSE_RANK() OVER(ORDER BY soh.OrderDate DESC) as DenseRank
from Sales.Individual Si
inner join SalesOrderHeader soh on soh.CustomerId = Si.CustomerId
) subquery
where subquery.DenseRank = 1
Upvotes: 0
Reputation: 48139
Inner joins are typically faster than subqueries as the subqueries are typically executed each time for whatever the current record is in the first part of the from clause...
Also, you could do better by having the MAX() order date as it's own query so it's not done on every record and join down stream. Ensure you have index on your order header table. I would have it based on (orderdate,customerid) so it's a covering index and doesn't need to go back to the raw data pages for other criteria as the order date and customer ID are in the index that it can utilize.
I would revise the query as...
select
soh2.CustomerID,
si.ContactID
from
( select max( soh.orderdate ) MaxDate
from sales.salesorderheader soh ) as JustDate
join sales.salesorderheader soh2
on JustDate.MaxDate = soh2.OrderDate
join Sales.Individual SI
on soh2.CustomerID = SI.CustomerID
order by
soh2.CustomerID,
si.ContactID
Upvotes: 0
Reputation: 2755
Join is generally preferred to the multi-level nesting with subqueries. In general joins are faster as SQL server Engine can better optimize this type of query.
Upvotes: 0