arjacsoh
arjacsoh

Reputation: 9242

Use of IN and EXISTS in SQL

Assuming that one has three Tables in a Relational Database as :

Customer(Id, Name, City),
Product(Id, Name, Price),
Orders(Cust_Id, Prod_Id, Date)

My first question is what is the best way to excecute the query: "Get all the Customers who ordered a Product". Some people propose the query with EXISTS as:

Select *
From Customer c
Where Exists (Select Cust_Id from Orders o where c.Id=o.cust_Id) 

Is the above query equivalent (can it be written?) as:

 Select *
 From Customer
 Where Exists (select Cust_id from Orders o Join Customer c on c.Id=o.cust_Id)

What is the problem when we use IN instead of EXISTS apart from the performance as:

Select *
From Customer
Where Customer.Id IN (Select o.cust_Id from Order o )

Do the three above queries return exactly the same records?

Update: How does really the EXISTS evaluation works in the second query (or the first), considering that it checks only if the Subquery returns true or false? What is the "interpretation" of the query i.e.?

Select *
From Customer c
Where Exists (True)

Upvotes: 2

Views: 285

Answers (3)

Matthew
Matthew

Reputation: 10444

Yes, each of those three should return identical result sets. Your second query is incorrect, as @ypercube points out in the commends. You're checking whether an uncorrellated subquery EXISTS

Of the two that work (1, 3), I'd expect #3 to be the fastest depending on your tables because it only executes the subquery one time.

However your most effective result is probably none of them but this:

SELECT DISTINCT
    c.*
FROM
    Customer c
JOIN
    Orders o
    ON o.[cust_id] = c.[Id]

because it should just be an index scan and a hash.

You should check the query plans and/or benchmark each one.

Upvotes: 2

CharlesC
CharlesC

Reputation: 350

The best way to execute that query is to add orders to the from clause and join to it.

select distinct c.* 
from customers c, 
orders o
where c.id = o.cust_id

Your other queries may be more inefficient (depending on the shape of the data) but they should all return the same result set.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270513

The first two queries are different.

The first has a correlated subquery and will return what you want -- information about customers who have an order.

The second has an uncorrelated subquery. It will return either all customers or no customers, depending on whether or not any customers have placed an order.

The third query is an alternative way of expressing what you want.

The only possible issue that I can think of would arise when cust_id might have NULL values. In such a case, the first and third queries may not return the same results.

Upvotes: 3

Related Questions