Reputation: 5151
Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.
Here are the tables of the database (It is just a simple database model of a curriculum system):
course:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| lecturer | varchar(20) | NO | | NULL | |
| credit | float | NO | | NULL | |
| week_from | tinyint(3) unsigned | NO | | NULL | |
| week_to | tinyint(3) unsigned | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
select:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| card_no | int(10) unsigned | NO | | NULL | |
| course_id | int(10) unsigned | NO | | NULL | |
| term | varchar(7) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
When I want to retrieve all the courses that a student has selected (with his card number), the SQL statement is
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
SELECT course_id FROM `select` WHERE card_no=<student's card number>
);
But, it was extremely slow and it didn't return anything for a long time.
So I changed WHERE IN
clauses into NATURAL JOIN
. Here is the SQL,
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;
It returns immediately and works fine!
So my question is:
NATURAL JOIN
and WHERE IN
Clauses?INDEX
?)NATURAL JOIN
or WHERE IN
?Upvotes: 11
Views: 9013
Reputation: 509
Try this:
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
SELECT s.course_id
FROM `select` s
WHERE card_no=<student's card number>
AND c.course_id = s.course_id
);
Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.
I think Barmar's index explanation is on the mark.
Upvotes: 4
Reputation: 780974
Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.
Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN
:
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | t_users | ALL | NULL | NULL | NULL | NULL | 2458304 | Using where |
| 2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13 | func,const | 7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| 1 | SIMPLE | t_user_attributes | ref | PRIMARY,attribute | attribute | 1 | const | 15 | Using where |
| 1 | SIMPLE | t_users | eq_ref | username,username_2 | username | 12 | bbodb_test.t_user_attributes.username | 1 | |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
Now it uses the index.
Upvotes: 5