soulmerge
soulmerge

Reputation: 75774

Performance of Cross join with WHERE clause compared to Inner join

The effect of issuing an inner join is the same as stating a cross join with the join condition in the WHERE-clause. I noticed that many people in my company use cross joins, where I would use inner joins. I didn't notice any significant performance gain after changing some of these queries and was wondering if it was just a coincidence or if the DBMS optimizes such issues transparently (MySql in our case). And here a concrete example for discussion:

SELECT User.*
FROM User, Address
WHERE User.addressId = Address.id;

SELECT User.*
FROM User
INNER JOIN Address ON (User.addressId = Address.id);

Upvotes: 80

Views: 77671

Answers (11)

Nihar
Nihar

Reputation: 1

There is not much difference between cross join with where clause and inner join they are probably the same but with the use of inner join we would decrease some part in code.

Upvotes: -1

karim79
karim79

Reputation: 342775

Cross Joins produce results that consist of every combination of rows from two or more tables. That means if table A has 6 rows and table B has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables – you literally just produce every possible combination.

With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.

If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.

As long as your queries abide by common sense and vendor specific performance guidelines (i), I like to think of the decision on which type of join to use to be a simple matter of taste.

(i) Vendor Specific Performance Guidelines

  1. MySQL Performance Tuning and Optimization Resources
  2. PostgreSQL Performance Optimization

Upvotes: 73

minhas23
minhas23

Reputation: 9671

Explaining both queries gives same output

mysql> explain select * from t T1, t T2 where T1.ID=T2.ID;
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                          |
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
|  1 | SIMPLE      | T1    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 |                                |
|  1 | SIMPLE      | T2    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
2 rows in set (0.00 sec)

mysql> explain select * from t T1  join t T2 on T1.ID=T2.ID;
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                          |
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
|  1 | SIMPLE      | T1    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 |                                |
|  1 | SIMPLE      | T2    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+
2 rows in set (0.00 sec)

But using inner join syntax is preferable as its more clearer and more precise. Mysql may internally tune Left and Right join queries to select less data as compared to Cross Join.

Upvotes: 3

Tegiri Nenashi
Tegiri Nenashi

Reputation: 3086

Since the beginning of time optimizers have being built around classic restrict-project-cartesian product syntax. Virtually all the vendors copied the design pioneered by System R. Then, grudgingly, vendors adopted "the latest-and-greatest" ANSI syntax and retrofitted their SQL execution engines. Contrary to what marketing brochure can tell you ("use the latest syntax"), not much on physical implementation level has been changed: it is still [indexed] nested loops, or hash or sort-merge join. Therefore, there is no basis to assume superiority of one syntax over the other.

To my personal taste, the new syntax is redundant, noisy, and inconsistent. As to being sanctioned by the committee, "walk into any park in every city and you'll find no statue of committee".

Upvotes: -1

nisiumi
nisiumi

Reputation: 354

SQL Server said "When a WHERE turns a Cross Join into an Inner Join", so there are not difference. http://msdn.microsoft.com/en-us/library/ms190690.aspx

I did SQL server "Execution plan" the Performance is same.

Upvotes: 3

Ozgur Ozturk
Ozgur Ozturk

Reputation: 1305

One additional benefit of the first syntax is you can be more general in your limiting condition. Not just equality.

But if you are using equality, why trust the optimizer? Make sure it won't first generate the cross join and then eliminate rows. Use the second one.

Upvotes: 1

Jonathan
Jonathan

Reputation: 26649

I find that work-places that allow the first syntax (comma separated tables) tend to have significant time taken up debugging cases where more rows are returned than intended. Unintentional cross joins are the bane of a system, and can bring even the most well-tuned database to it's knees. It has brought our pre-prod system to a screeching halt on at least two occasions in the last year.

The second syntax (join syntax) forces the writer to think about how the tables are joined together first, and then only return the interesting rows. It is impossible to accidentally do a cross join using this syntax, and thus the danger of accidental poorly performing queries is reduced.

However, that issue aside, I have never noticed any speed difference between the two syntaxes in any systems I have had.

Upvotes: 12

HLGEM
HLGEM

Reputation: 96650

The first example is functionally the same as the second example. However, this syntax should be avoided for several reasons. First it is much easier to accidentally get a cross join when using this syntax especially when there are mulitple joins in the table. If you see a lot of this type of query with the keyword distinct, you probably have someone who is trying to fix the cross joins.

Next, the left and right join syntax using the older style is deprecated and will no longer be supported. Further, it doesn't work correctly now anyway. Sometimes it misinterprets the outer join and sends back the wrong results set. So any queries you have using = or = in the where clause should immediately be replaced.

Third, ANSI standard joins are easier to understand and maintain. An understanding of joins is one of the most critical basic skills that anyone querying any relational database needs to have. It has been my experience that some people who use the older style don't really understand joins and how they work and thus write queries that do not actually do what they intended.

Upvotes: 2

Brimstedt
Brimstedt

Reputation: 3140

The order in which you join tables or you put your ON / WHERE conditions should not matter.

The query optimizer should optimize and use the best order anyway (and chosing how to best filter the data, where to start, etc)

As many others though, I suggest using the INNER JOIN syntax, since it makes things much more readable, it is more transparent with the syntax of LEFT or FULL joins as well.

There's a somewhat more dwelling text about it here: http://linus.brimstedt.se/?/article/articleview/SQL Syntax

/B

Upvotes: 1

Emil H
Emil H

Reputation: 40220

Use EXPLAIN to view the query plan for both queries, and see if there's any difference. Quite possibly MySQL will use the same execution plan in both cases. I use the INNER JOIN syntax mainly because it's a lot clearer.

Upvotes: 20

Otávio Décio
Otávio Décio

Reputation: 74310

There is no difference other than the inner join is a lot clearer because it defines the join, leaving the where clause to be the actual limiting condition.

Upvotes: 30

Related Questions