Ben
Ben

Reputation: 68658

Are all these SQL joins logically equivalent?

I'm just wondering if all of the following joins are logically equivalent, and if not, why not?

SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a where t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b where t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

I guess my real question is: does combining "where" with "on" doing something different from just having multiple conditions ANDed together with "on"?

I work with MySQL, in case that makes a difference.

Upvotes: 5

Views: 620

Answers (4)

Constantin
Constantin

Reputation: 28204

For INNER JOIN it makes no logical difference and optimizer should produce same plans. But for OUTER joins it becomes important whether you put condition in WHERE or FROM ... JOIN clause. This is because FROM and ON clauses are processed before WHERE clause: ANSI SQL logical query processing http://www.sqlmag.com/Files/09/94378/Figure_01.jpg

Upvotes: 2

achinda99
achinda99

Reputation: 5078

They are logically equivalent. However, where you define the join conditions makes a difference as to how many records are used in the temporary table on which the where clause is applied. That is,

If table t1, t2 and t3 had 10 records each, the statement,

SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

results in 1000 records of a permutation of the three tables records and then the where clause is applied.

For

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

only ten records are in the temporary table before any where clause (none in this case) is applied. The second method is much faster when working with large tables.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562891

Yes, as others have stated, the result is the same from all these queries.

FWIW, you can also use this shorthand syntax when you're doing an equi-join on column names that are the same in both tables:

SELECT t1.x, t2.y from t1 join t2 using (a, b, c);

As far as optimization, it should be optimized the same. That is, the RDBMS should be smart enough to analyze the WHERE syntax the same, and perform joins instead of generating an intermediate huge cross-join result and applying filtering conditions to it. This is such a common type of query, that it's also common for a given RDBMS implementation to recognize and optimize it.

In the case of MySQL, join and where are (kind of) evaluated together. Try using EXPLAIN to analyze your query. If the "type" column indicates "eq_ref" it means it's using an indexed join. This is the best type of join with respect to optimization. If "type" is "ref" it's good too.

You can get these join optimization types whether you put the condition in the JOIN...ON clause or the WHERE clause.

Upvotes: 2

Tony Andrews
Tony Andrews

Reputation: 132710

They are logically equivalent and should produce the same result. However, the last one is to be preferred as it states more correctly the semantics of the query - i.e. "join tables t1 and t2".

The WHERE clause should be used for "filtering" results of the join - e.g.

... WHERE t2.some_col > 10

Also, as Constantin has said in another answer, the 4 queries would be different if the join was an OUTER join.

Upvotes: 5

Related Questions