SELECT ... WHERE IN vs JOIN

Question

Assume we have 4 tables A, B, C, D in some unspecified relational SQL database. A references B, also C and D as well. References means that A has column A.refX_id = X.id, X is A, B and C (common foreign key 1:N).

What I want, is to query table A with condition based on columns from all child tables B, C and D. My question is: What of the following variants is generally better? (In terms of usability, efficiency, speed.)

Variant 1:

SELECT DISTINCT A.* FROM A
   JOIN B ON A.refB_id = B.id
   JOIN C ON A.refC_id = C.id
   JOIN D ON A.refD_id = D.id
   WHERE  AND  AND ;

Which I like more from perspective of a database, but looks a little bit harder to program.

Variant 2:

SELECT id FROM B WHERE ; # result store to array "BIds" on program side
SELECT id FROM C WHERE ; # result store to array "CIds" on program side
SELECT id FROM D WHERE ; # result store to array "DIds" on program side

SELECT A.* FROM A
   WHERE refB_id IN () AND refC_id IN () AND refD_id IN ();

#  menas expand whole array of ids, which can result in a very long query string

I thought that Variant 2 is completelly smut and unusable with potentially large data. But I've heard, that a lot of frameworks normally use it, because it is relativelly straightforward. Is it sort of legal way to query data like this in general case, if I know that the content of the "IN" clause is taken from a result of another query(ies)?

Gordon Linoff · Accepted Answer

I would encourage you to use IN or EXISTS:

SELECT A.*
FROM A
WHERE EXISTS (SELECT 1 FROM B WHERE A.refB_id = B.id AND ) AND
      EXISTS (SELECT 1 FROM C WHERE A.refC_id = C.id AND ) AND
      EXISTS (SELECT 1 FROM D WHERE A.refD_id = D.id AND );

Advantages of this approach:

No danger of getting large intermediate Cartesian products.
No expense of eliminating duplicates for SELECT DISTINCT.
Most databases handle EXISTS pretty well.
You can optimize each subquery using indexes.

EDIT:

You can write this with IN and subqueries:

SELECT A.*
FROM A
WHERE A.refB_id IN (SELECT B.id FROM B WHERE ) AND
      A.refC_id IN (SELECT C.id FROM C WHERE ) AND
      A.refD_id IN (SELECT D.id FROM D WHERE );

SELECT ... WHERE IN vs JOIN

Answers (2)

Related Questions