Carl
Carl

Reputation: 1

Simple mysql SELECT with IN-clause (which returns an empty set) runs forever

On my MariaDB I have a table, 'cv_attribute', which contains 296k records and has 8 columns:

+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| cvId           | varchar(255) | NO   | MUL | NULL    |       |
| updateId       | bigint(20)   | NO   | MUL | NULL    |       |
| attributeId    | varchar(255) | YES  |     | NULL    |       |
| attributeType  | varchar(255) | YES  |     | NULL    |       |
| attributeLang  | varchar(255) | YES  |     | NULL    |       |
| attributeValue | varchar(255) | YES  |     | NULL    |       |
| MD5            | varchar(255) | YES  | MUL | NULL    |       |
| ingestId       | bigint(20)   | YES  |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+

I was given a script that runs the query

SELECT DISTINCT * FROM cv_attribute WHERE `MD5` IN (SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1);

but this runs forever. As in: still running even after a week!

Notice that the query contains an IN-clause. When running the query within that IN-clause, it takes 11 seconds and returns no results:

SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1;
Empty set (11.51 sec)

I'm totally clueless how the complete query (which basically is just a SELECT DISTINCT *) never finishes, while it's IN-clause returns an empty set within 11 seconds. So, to my understanding, this SELECT DISTINCT * would just query an empty list and be pretty fast.

Note, this is the explain of the full query:

explain SELECT DISTINCT * FROM cv_attribute WHERE `MD5` IN (SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1);
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
| id   | select_type        | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra                        |
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
|    1 | PRIMARY            | cv_attribute | ALL  | NULL          | NULL | NULL    | NULL | 505223 | Using where; Using temporary |
|    2 | DEPENDENT SUBQUERY | cv_attribute | ALL  | NULL          | NULL | NULL    | NULL | 505223 | Using temporary              |
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
2 rows in set (0.00 sec)

I hope anybody here would see the light!

Thanks a lot! Carl

Upvotes: 0

Views: 96

Answers (0)

Related Questions