Wenhao Ji
Wenhao Ji

Reputation: 5151

NATURAL JOIN vs WHERE IN Clauses

Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.

Here are the tables of the database (It is just a simple database model of a curriculum system):

course:

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| name      | varchar(20)         | NO   |     | NULL    |                |
| lecturer  | varchar(20)         | NO   |     | NULL    |                |
| credit    | float               | NO   |     | NULL    |                |
| week_from | tinyint(3) unsigned | NO   |     | NULL    |                |
| week_to   | tinyint(3) unsigned | NO   |     | NULL    |                |
+-----------+---------------------+------+-----+---------+----------------+

select:

+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| card_no   | int(10) unsigned | NO   |     | NULL    |                |
| course_id | int(10) unsigned | NO   |     | NULL    |                |
| term      | varchar(7)       | NO   |     | NULL    |                |
+-----------+------------------+------+-----+---------+----------------+

When I want to retrieve all the courses that a student has selected (with his card number), the SQL statement is

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
    SELECT course_id FROM `select` WHERE card_no=<student's card number>
);

But, it was extremely slow and it didn't return anything for a long time. So I changed WHERE IN clauses into NATURAL JOIN. Here is the SQL,

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;

It returns immediately and works fine!

So my question is:

Upvotes: 11

Views: 9013

Answers (2)

Carl
Carl

Reputation: 509

Try this:

SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
    SELECT s.course_id 
    FROM `select` s
    WHERE card_no=<student's card number>
    AND   c.course_id = s.course_id
);

Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.

I think Barmar's index explanation is on the mark.

Upvotes: 4

Barmar
Barmar

Reputation: 780974

Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.

Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN:

+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
|  1 | PRIMARY            | t_users           | ALL            | NULL              | NULL    | NULL    | NULL       | 2458304 | Using where              |
|  2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13      | func,const |       7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+

It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:

+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table             | type   | possible_keys       | key       | key_len | ref                                   | rows | Extra       |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
|  1 | SIMPLE      | t_user_attributes | ref    | PRIMARY,attribute   | attribute | 1       | const                                 |   15 | Using where |
|  1 | SIMPLE      | t_users           | eq_ref | username,username_2 | username  | 12      | bbodb_test.t_user_attributes.username |    1 |             |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+

Now it uses the index.

Upvotes: 5

Related Questions