Reputation: 87
Here is a simple query.
select * where 'id' in [1,2,3,4,5] AND 'sub_id' in [1];
If I switch these two wherein condition to this,
select * where 'sub_id' in [1] AND 'id' in [1,2,3,4,5];
With my first intuition, those two queries seem to have the same time complexity.
However, if I suppose 'id' column has 'N' data(extremely large), then the average time for the first query would take 5n
, assuming 'sub_id' in [1]
part would be O(1), since there are only 5 records to traverse, negligibly small.
On the other hand, second query would take only 1n
, since I am only searching for 1 record among N
data(assuming 'id' in [1,2,3,4,5]
part would be negligibly small too).
Is my assumption hypothetically correct? Assuming 'id' is not pk, thus no index is set.
So, my real question is, when I put wherein (...)
condition in SQL, should I put smaller range first then only larger range in order to make the performance better?
Please fix me if I'm wrong.
Upvotes: 0
Views: 1596
Reputation: 522007
In the case of your current query, the order of which of the 2 ANDed terms in the WHERE
clause gets evaluated first is moot, because by definition the SQL engine must check all parts of the AND
expression to make a decision. If the query had ORed terms, then your ultimate question would make more sense. In that case, if you were to examine the execution plan, your query could evaluate in either of the 2 following orders:
SELECT * FROM yourTable WHERE id IN (1, 2, 3, 4, 5) OR sub_id IN (1);
SELECT * FROM yourTable WHERE sub_id IN (1) OR id IN (1, 2, 3, 4, 5);
In this case, how you write the SQL is also a moot point, because your SQL optimizer most likely would be smart enough to figure out the fastest way to evaluate the SQL.
Your responsibility as a data engineer would be to do something like put an index on (sub_id, id)
, or maybe (id, sub_id)
, which if used would speed up the execution of the WHERE
clause.
Upvotes: 1