Reputation: 8352

A subquery that should be independent is not. Why?

I have a table files with files and a table reades with read accesses to these files. In the table reades there is a column file_id where refers to the respective column in files.

Now I would like to list all files which have not been accessed and tried this:

SELECT * FROM files WHERE file_id NOT IN (SELECT file_id FROM reades)

This is terribly slow. The reason is that mySQL thinks that the subquery is dependent on the query:

+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type        | table  | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | files  | ALL  | NULL          | NULL | NULL    | NULL | 1053 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | reades | ALL  | NULL          | NULL | NULL    | NULL | 3242 |   100.00 | Using where |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+

But why? The subquery is completely independent and more or less just meant to return a list of ids.

(To be precise: Each file_id can appear multiple times in reades, of course, as there can be arbitrarily many read operations for each file.)

Upvotes: 4

Answers (6)

Øystein Grøvlen

Reputation: 1336

IN-subqueries are in MySQL 5.5 and earlier converted to EXIST subqueries. The given query will be converted to the following query:

SELECT * FROM files WHERE NOT EXISTS (SELECT 1 FROM reades WHERE reades.filed_id = files.file_id)

As you see, the subquery is actually dependent.

MySQL 5.6 may choose to materialize the subquery. That is, first, run the inner query and store the result in a temporary table (removing duplicates). Then, it can use a join-like operation between the outer table (i.e., files) and the temporary table to find the rows with no match. This way of executing the query will probably be more optimal if reades.file_id is not indexed.

However, if reades.file_id is indexed, the traditional IN-to-EXISTS execution strategy is actually pretty efficient. In that case, I would not expect any significant performance improvement from converting the query into a join as suggested in other answers. MySQL 5.6 optimizer makes a cost-based choice between materialization and IN-to-EXISTS execution.

Upvotes: 0

fuenfundachtzig

Reputation: 8352

Looking at this page I found two possible solutions which both work. Just for completeness I add one of those, similar to the answers with JOINs shown above, but it is fast even without using foreign keys:

  SELECT * FROM files AS f 
    INNER JOIN (SELECT DISTINCT file_id FROM reades) AS r 
    ON f.file_id = r.file_id

This solves the problem, but still this does not answer my question :)

EDIT: If I interpret the EXPLAIN output correctly, this is fast, because the interpreter generates a temporary index:

+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref       | rows | Extra                    |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL      |  843 |                          |
|  1 | PRIMARY     | f          | eq_ref | PRIMARY       | PRIMARY | 4       | r.file_id |    1 |                          |
|  2 | DERIVED     | reades     | range  | NULL          | file_id | 5       | NULL      |  811 | Using index for group-by |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+

Upvotes: 0

longneck

Reputation: 12226

i've seen this before. it's a bug in mysql. try this:

SELECT * FROM files WHERE file_id NOT IN (SELECT * FROM (SELECT file_id FROM reades))

there bug report is here: http://bugs.mysql.com/bug.php?id=25926

Upvotes: 4

CaffGeek

Reputation: 22064

Does MySQL support EXISTS in the same way that MSSQL would? If so, you could rewrite the query as

SELECT * FROM files as f WHERE file_id NOT EXISTS (SELECT 1 FROM reades r WHERE r.file_id = f.file_id)

Using IN is horribly inefficient as it runs that subquery for each row in the parent query.

Upvotes: 0

Andomar

Reputation: 238166

Try replacing the subquery with a join:

SELECT * 
FROM files f
LEFT OUTER JOIN reades r on r.file_id = f.file_id
WHERE r.file_id IS NULL

Here's a link to an article about this problem. The writer of that article wrote a stored procedure to force MySQL to evaluate subqueries as independant. I doubt that's necessary in this case though.

Upvotes: 4

chaos

Reputation: 124315

Try:

SELECT * FROM files WHERE file_id NOT IN (SELECT reades.file_id FROM reades)

That is: if it's coming up as dependent, perhaps that's because of ambiguity in what file_id refers to, so let's try fully qualifying it.

If that doesn't work, just do:

SELECT files.*
FROM files
LEFT JOIN reades
USING (file_id)
WHERE reades.file_id IS NULL

Upvotes: 2

A subquery that should be independent is not. Why?

Answers (6)

Related Questions