vilanovi
vilanovi

Reputation: 2117

Does the order of JOIN vs WHERE in SQL affect performance?

In SQL, how much the order of JOIN vs WHERE affect the performance of a query?

a) SELECT […] FROM A JOIN ( SELECT […] FROM B WHERE CONDITION ) ON […]

b) SELECT […] FROM A JOIN ( SELECT […] FROM B ) ON […] WHERE CONDITION

My inner feeling tells me that option a) should be more performant: if we do first a join and then we run a where, it seems way less performant than first running a where on one table, and from the resutls doing a join. But I’m not sure as this depends on the internal optimizations of the SQL library itself.

Would be nice to know if the behavior is the same for both MySQL and PostgreSQL, and also if it depends on any other decorators as group by or order by.

Upvotes: 1

Views: 360

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271023

Postgres has a smart optimizer so the two versions should have similar execution plans, under most cases (I'll return to that in a moment).

MySQL has a tendency to materialize subqueries. Although this has gotten better in more recent versions, I still recommend avoiding it. Materializing subqueries prevents the use of indexes and can have a significant impact on performance.

One caveat: If the subquery is complicated, then it might be better to filter as part of the subquery. For instance, if it is an aggregation, then filtering before aggregating usually results in better performance. That said, Postgres is smart about pushing conditions into the subquery. So, if the outer filtering is on a key used in aggregation, Postgres is smart enough to push the condition into the subquery.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522741

All other factors being equal, I would expect the A version to perform better than the B version, as you also seem to expect. The main reason for this is that the A version lets the database throw out rows using the WHERE clause in the subquery. Then the join only has to involve a smaller intermediate table. The exact difference in performance between the two would depend on the underlying data and the actual queries. Note that it is even possible that both queries could be optimized under the hood to the same or very similar execution plan.

Upvotes: 1

Related Questions