Reputation: 341
the two bits of SQL below get the same result
SELECT c.name, o.product
FROM customer c, order o
WHERE c.id = o.cust_id
AND o.value = 150
SELECT c.name, o.product
FROM customer c
INNER JOIN order o on c.id = o.cust_id
WHERE o.value = 150
I've seen both styles used as standard at different companies. From what I've seen, the 2nd one is what most people recommend online. Is there any real reason for this other than style? Does using an Inner Join sometimes have better performance?
I've noticed Ingres and Oracle developers tend to use the first style, whereas Microsoft SQL Server users have tended to use the second, but that might just be a coincidence.
Thanks for any insight, I've wondered about this for a while.
Edit: I've changed the title from 'SQL Inner Join versus Cartesian Product' as I was using the incorrect terminlogy. Thanks for all the responses so far.
Upvotes: 29
Views: 32055
Reputation: 153760
An INNER JOIN statement can be rewritten as a CROSS JOIN with a WHERE clause matching the same condition you used in the ON clause of the INNER JOIN query.
Considering we have the following post
and post_comment
tables:
The post
has the following records:
| id | title |
|----|-----------|
| 1 | Java |
| 2 | Hibernate |
| 3 | JPA |
and the post_comment
has the following three rows:
| id | review | post_id |
|----|-----------|---------|
| 1 | Good | 1 |
| 2 | Excellent | 1 |
| 3 | Awesome | 2 |
The SQL JOIN clause allows you to associate rows that belong to different tables. For instance, a CROSS JOIN will create a Cartesian Product containing all possible combinations of rows between the two joining tables.
While the CROSS JOIN is useful in certain scenarios, most of the time, you want to join tables based on a specific condition. And, that's where INNER JOIN comes into play.
The SQL INNER JOIN allows us to filter the Cartesian Product of joining two tables based on a condition that is specified via the ON clause.
If you provide an "always true" condition, the INNER JOIN will not filter the joined records, and the result set will contain the Cartesian Product of the two joining tables.
For instance, if we execute the following SQL INNER JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 1
We will get all combinations of post
and post_comment
records:
| p.id | pc.id |
|---------|------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
So, if the ON clause condition is "always true", the INNER JOIN is simply equivalent to a CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 1
ORDER BY p.id, pc.id
On the other hand, if the ON clause condition is "always false", then all the joined records are going to be filtered out and the result set will be empty.
So, if we execute the following SQL INNER JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 0
ORDER BY p.id, pc.id
We won't get any result back:
| p.id | pc.id |
|---------|------------|
That's because the query above is equivalent to the following CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 0
ORDER BY p.id, pc.id
The most common ON clause condition is the one that matches the Foreign Key column in the child table with the Primary Key column in the parent table, as illustrated by the following query:
SELECT
p.id AS "p.id",
pc.post_id AS "pc.post_id",
pc.id AS "pc.id",
p.title AS "p.title",
pc.review AS "pc.review"
FROM post p
INNER JOIN post_comment pc ON pc.post_id = p.id
ORDER BY p.id, pc.id
When executing the above SQL INNER JOIN query, we get the following result set:
| p.id | pc.post_id | pc.id | p.title | pc.review |
|---------|------------|------------|------------|-----------|
| 1 | 1 | 1 | Java | Good |
| 1 | 1 | 2 | Java | Excellent |
| 2 | 2 | 3 | Hibernate | Awesome |
So, only the records that match the ON clause condition are included in the query result set. In our case, the result set contains all the post
along with their post_comment
records. The post
rows that have no associated post_comment
are excluded since they can not satisfy the ON Clause condition.
Again, the above SQL INNER JOIN query is equivalent to the following CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.post_id AS "pc.post_id",
pc.id AS "pc.id",
p.title AS "p.title",
pc.review AS "pc.review"
FROM post p, post_comment pc
WHERE pc.post_id = p.id
The non-struck rows are the ones that satisfy the WHERE clause, and only these records are going to be included in the result set. That's the best way to visualize how the INNER JOIN clause works.
| p.id | pc.post_id | pc.id | p.title | pc.review | |------|------------|-------|-----------|-----------| | 1 | 1 | 1 | Java | Good | | 1 | 1 | 2 | Java | Excellent || 1 | 2 | 3 | Java | Awesome || 2 | 1 | 1 | Hibernate | Good || 2 | 1 | 2 | Hibernate | Excellent || 2 | 2 | 3 | Hibernate | Awesome || 3 | 1 | 1 | JPA | Good || 3 | 1 | 2 | JPA | Excellent || 3 | 2 | 3 | JPA | Awesome |
Not that this only applies to INNER JOIN, not for OUTER JOIN.
Upvotes: 2
Reputation: 15971
Both queries are an inner joins and equivalent. The first is the older method of doing things, whereas the use of the JOIN syntax only became common after the introduction of the SQL-92 standard (I believe it's in the older definitions, just wasn't particularly widely used before then).
The use of the JOIN syntax is strongly preferred as it separates the join logic from the filtering logic in the WHERE clause. Whilst the JOIN syntax is really syntactic sugar for inner joins it's strength lies with outer joins where the old * syntax can produce situations where it is impossible to unambiguously describe the join and the interpretation is implementation-dependent. The [LEFT | RIGHT] JOIN syntax avoids these pitfalls, and hence for consistency the use of the JOIN clause is preferable in all circumstances.
Note that neither of these two examples are Cartesian products. For that you'd use either
SELECT c.name, o.product
FROM customer c, order o
WHERE o.value = 150
or
SELECT c.name, o.product
FROM customer c CROSS JOIN order o
WHERE o.value = 150
Upvotes: 29
Reputation: 3622
Both queries are performing an inner join, just different syntax.
Upvotes: 3
Reputation:
The JOIN... ON... syntax is a more recent addition to ANSI and ISO specs for SQL. The JOIN... ON... syntax is generally preferred because it 1) moves the join criteria out of the WHERE clause making the WHERE clause just for filtering and 2) makes it more obvious if you are creating a dreaded Cartesian product since each JOIN must be accompanied by at least one ON clause. If all the join criteria are just ANDed in the WHERE clause, it's not as obvious when one or more is missing.
Upvotes: 3
Reputation: 34391
Oracle was late in supporting the JOIN ... ON (ANSI) syntax (not until Oracle 9), that's why Oracle developers often don't use it.
Personally, I prefer using ANSI syntax when it is logically clear that one table is driving the query and the others are lookup tables. When tables are "equal", I tend to use the cartesian syntax.
The performance should not differ at all.
Upvotes: 4
Reputation: 24281
To answer part of your question, I think early bugs in the JOIN ... ON syntax in Oracle discouraged Oracle users away from that syntax. I don't think there are any particular problems now.
They are equivalent and should be parsed into the same internal representation for optimization.
Upvotes: 6
Reputation: 35247
Actually these examples are equivalent and neither is a cartesian product. A cartesian product is returned when you join two tables without specifying a join condition, such as in
select *
from t1,t2
There is a good discussion of this on Wikipedia.
Upvotes: 4