Reputation: 1
On my MariaDB I have a table, 'cv_attribute', which contains 296k records and has 8 columns:
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| cvId | varchar(255) | NO | MUL | NULL | |
| updateId | bigint(20) | NO | MUL | NULL | |
| attributeId | varchar(255) | YES | | NULL | |
| attributeType | varchar(255) | YES | | NULL | |
| attributeLang | varchar(255) | YES | | NULL | |
| attributeValue | varchar(255) | YES | | NULL | |
| MD5 | varchar(255) | YES | MUL | NULL | |
| ingestId | bigint(20) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
I was given a script that runs the query
SELECT DISTINCT * FROM cv_attribute WHERE `MD5` IN (SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1);
but this runs forever. As in: still running even after a week!
Notice that the query contains an IN-clause. When running the query within that IN-clause, it takes 11 seconds and returns no results:
SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1;
Empty set (11.51 sec)
I'm totally clueless how the complete query (which basically is just a SELECT DISTINCT *) never finishes, while it's IN-clause returns an empty set within 11 seconds. So, to my understanding, this SELECT DISTINCT * would just query an empty list and be pretty fast.
Note, this is the explain of the full query:
explain SELECT DISTINCT * FROM cv_attribute WHERE `MD5` IN (SELECT `MD5` FROM cv_attribute GROUP BY `MD5` HAVING COUNT(*) > 1);
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
| 1 | PRIMARY | cv_attribute | ALL | NULL | NULL | NULL | NULL | 505223 | Using where; Using temporary |
| 2 | DEPENDENT SUBQUERY | cv_attribute | ALL | NULL | NULL | NULL | NULL | 505223 | Using temporary |
+------+--------------------+--------------+------+---------------+------+---------+------+--------+------------------------------+
2 rows in set (0.00 sec)
I hope anybody here would see the light!
Thanks a lot! Carl
Upvotes: 0
Views: 96