Reputation: 9242
Assuming that one has three Tables in a Relational Database as :
Customer(Id, Name, City),
Product(Id, Name, Price),
Orders(Cust_Id, Prod_Id, Date)
My first question is what is the best way to excecute the query: "Get all the Customers who ordered a Product".
Some people propose the query with EXISTS
as:
Select *
From Customer c
Where Exists (Select Cust_Id from Orders o where c.Id=o.cust_Id)
Is the above query equivalent (can it be written?) as:
Select *
From Customer
Where Exists (select Cust_id from Orders o Join Customer c on c.Id=o.cust_Id)
What is the problem when we use IN
instead of EXISTS
apart from the performance as:
Select *
From Customer
Where Customer.Id IN (Select o.cust_Id from Order o )
Do the three above queries return exactly the same records?
Update: How does really the EXISTS evaluation works in the second query (or the first), considering that it checks only if the Subquery returns true or false? What is the "interpretation" of the query i.e.?
Select *
From Customer c
Where Exists (True)
Upvotes: 2
Views: 285
Reputation: 10444
Yes, each of those three should return identical result sets.
Your second query is incorrect, as @ypercube points out in the commends. You're checking whether an uncorrellated subquery EXISTS
Of the two that work (1, 3), I'd expect #3 to be the fastest depending on your tables because it only executes the subquery one time.
However your most effective result is probably none of them but this:
SELECT DISTINCT
c.*
FROM
Customer c
JOIN
Orders o
ON o.[cust_id] = c.[Id]
because it should just be an index scan and a hash.
You should check the query plans and/or benchmark each one.
Upvotes: 2
Reputation: 350
The best way to execute that query is to add orders to the from clause and join to it.
select distinct c.*
from customers c,
orders o
where c.id = o.cust_id
Your other queries may be more inefficient (depending on the shape of the data) but they should all return the same result set.
Upvotes: 0
Reputation: 1270513
The first two queries are different.
The first has a correlated subquery and will return what you want -- information about customers who have an order.
The second has an uncorrelated subquery. It will return either all customers or no customers, depending on whether or not any customers have placed an order.
The third query is an alternative way of expressing what you want.
The only possible issue that I can think of would arise when cust_id
might have NULL values. In such a case, the first and third queries may not return the same results.
Upvotes: 3