Reputation: 8352
I have a table files
with files and a table reades
with read accesses to these files. In the table reades
there is a column file_id
where refers to the respective column in files
.
Now I would like to list all files which have not been accessed and tried this:
SELECT * FROM files WHERE file_id NOT IN (SELECT file_id FROM reades)
This is terribly slow. The reason is that mySQL thinks that the subquery is dependent on the query:
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | files | ALL | NULL | NULL | NULL | NULL | 1053 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | reades | ALL | NULL | NULL | NULL | NULL | 3242 | 100.00 | Using where |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
But why? The subquery is completely independent and more or less just meant to return a list of ids.
(To be precise: Each file_id
can appear multiple times in reades
, of course, as there can be arbitrarily many read operations for each file.)
Upvotes: 4
Views: 1416
Reputation: 1336
IN-subqueries are in MySQL 5.5 and earlier converted to EXIST subqueries. The given query will be converted to the following query:
SELECT * FROM files WHERE NOT EXISTS (SELECT 1 FROM reades WHERE reades.filed_id = files.file_id)
As you see, the subquery is actually dependent.
MySQL 5.6 may choose to materialize the subquery. That is, first, run the inner query and store the result in a temporary table (removing duplicates). Then, it can use a join-like operation between the outer table (i.e., files) and the temporary table to find the rows with no match. This way of executing the query will probably be more optimal if reades.file_id is not indexed.
However, if reades.file_id is indexed, the traditional IN-to-EXISTS execution strategy is actually pretty efficient. In that case, I would not expect any significant performance improvement from converting the query into a join as suggested in other answers. MySQL 5.6 optimizer makes a cost-based choice between materialization and IN-to-EXISTS execution.
Upvotes: 0
Reputation: 8352
Looking at this page I found two possible solutions which both work. Just for completeness I add one of those, similar to the answers with JOINs shown above, but it is fast even without using foreign keys:
SELECT * FROM files AS f
INNER JOIN (SELECT DISTINCT file_id FROM reades) AS r
ON f.file_id = r.file_id
This solves the problem, but still this does not answer my question :)
EDIT: If I interpret the EXPLAIN output correctly, this is fast, because the interpreter generates a temporary index:
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 843 | |
| 1 | PRIMARY | f | eq_ref | PRIMARY | PRIMARY | 4 | r.file_id | 1 | |
| 2 | DERIVED | reades | range | NULL | file_id | 5 | NULL | 811 | Using index for group-by |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
Upvotes: 0
Reputation: 12226
i've seen this before. it's a bug in mysql. try this:
SELECT * FROM files WHERE file_id NOT IN (SELECT * FROM (SELECT file_id FROM reades))
there bug report is here: http://bugs.mysql.com/bug.php?id=25926
Upvotes: 4
Reputation: 22064
Does MySQL support EXISTS in the same way that MSSQL would? If so, you could rewrite the query as
SELECT * FROM files as f WHERE file_id NOT EXISTS (SELECT 1 FROM reades r WHERE r.file_id = f.file_id)
Using IN is horribly inefficient as it runs that subquery for each row in the parent query.
Upvotes: 0
Reputation: 238166
Try replacing the subquery with a join:
SELECT *
FROM files f
LEFT OUTER JOIN reades r on r.file_id = f.file_id
WHERE r.file_id IS NULL
Here's a link to an article about this problem. The writer of that article wrote a stored procedure to force MySQL to evaluate subqueries as independant. I doubt that's necessary in this case though.
Upvotes: 4
Reputation: 124315
Try:
SELECT * FROM files WHERE file_id NOT IN (SELECT reades.file_id FROM reades)
That is: if it's coming up as dependent, perhaps that's because of ambiguity in what file_id
refers to, so let's try fully qualifying it.
If that doesn't work, just do:
SELECT files.*
FROM files
LEFT JOIN reades
USING (file_id)
WHERE reades.file_id IS NULL
Upvotes: 2