Reputation: 1065
Assume we have 4 tables A, B, C, D in some unspecified relational SQL database. A references B, also C and D as well. References means that A has column A.refX_id = X.id, X is A, B and C (common foreign key 1:N).
What I want, is to query table A with condition based on columns from all child tables B, C and D. My question is: What of the following variants is generally better? (In terms of usability, efficiency, speed.)
Variant 1:
SELECT DISTINCT A.* FROM A
JOIN B ON A.refB_id = B.id
JOIN C ON A.refC_id = C.id
JOIN D ON A.refD_id = D.id
WHERE <condition on B> AND <condition on C> AND <condition on D>;
Which I like more from perspective of a database, but looks a little bit harder to program.
Variant 2:
SELECT id FROM B WHERE <condition on B>; # result store to array "BIds" on program side
SELECT id FROM C WHERE <condition on C>; # result store to array "CIds" on program side
SELECT id FROM D WHERE <condition on D>; # result store to array "DIds" on program side
SELECT A.* FROM A
WHERE refB_id IN (<B_ids>) AND refC_id IN (<C_ids>) AND refD_id IN (<D_ids>);
# <B_ids> menas expand whole array of ids, which can result in a very long query string
I thought that Variant 2 is completelly smut and unusable with potentially large data. But I've heard, that a lot of frameworks normally use it, because it is relativelly straightforward. Is it sort of legal way to query data like this in general case, if I know that the content of the "IN" clause is taken from a result of another query(ies)?
Upvotes: 3
Views: 3478
Reputation: 1269803
I would encourage you to use IN
or EXISTS
:
SELECT A.*
FROM A
WHERE EXISTS (SELECT 1 FROM B WHERE A.refB_id = B.id AND <condition on B>) AND
EXISTS (SELECT 1 FROM C WHERE A.refC_id = C.id AND <condition on C>) AND
EXISTS (SELECT 1 FROM D WHERE A.refD_id = D.id AND <condition on D>);
Advantages of this approach:
SELECT DISTINCT
.EXISTS
pretty well.EDIT:
You can write this with IN
and subqueries:
SELECT A.*
FROM A
WHERE A.refB_id IN (SELECT B.id FROM B WHERE <condition on B>) AND
A.refC_id IN (SELECT C.id FROM C WHERE <condition on C>) AND
A.refD_id IN (SELECT D.id FROM D WHERE <condition on D>);
Upvotes: 4
Reputation: 77876
Not sure which framework uses the 2nd approach but first approach is what I will go for and moreover everyone else will. If you have proper index created on the join column for all tables then 1st approach will produce a much better plan than the 2nd one cause you have multiple IN
clause and what if every IN
has to work on millions of elements???.
Also, I would change the INNER JOIN
to LEFT JOIN
assuming that not all ID's may match and move the WHERE
conditions to JOIN ON
conditions like
SELECT DISTINCT A.* FROM A
LEFT JOIN B ON A.refB_id = B.id AND <condition on B>
LEFT JOIN C ON A.refC_id = C.id AND <condition on C>
LEFT JOIN D ON A.refD_id = D.id AND <condition on D>;
Upvotes: 3