Reputation: 5520
I have an SQL Query whose purpose is to "prune" items in a table which have an occurrence of less than 2. The table is quite large, and the number of items to prune is also quite large (~21000 rows).
The delete operation I am running is as follows:
DELETE FROM Terms
WHERE TermID IN (
SELECT TermID
FROM TermOccurrences
GROUP BY TermID
HAVING SUM(Counter) <=2
);
The query is so slow, that I have yet to see it complete (and I have tried to rerun it numerous times). I should also note, that my CPU/Disk Monitor shows high CPU usage on one core but very little I/O on disk
The Inner SELECT Query is fast if I run it independently, so I do not think that is the bottleneck. Here is an EXPLAIN of the Inner Query:
+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | TermOccurrences | ALL | NULL | NULL | NULL | NULL | 113135 | Using temporary; Using filesort |
+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+
I should also note that the TermOccurrences table has foreign key constraint on Terms.TermID and it is set to cascade delete.
The Terms table has a primary key index on TermID and TermOccurrences has a composite primary key index on two columns (TermID, PageID).
The tables are all using the InnoDB engine (but I have also tried MyISAM with no difference).
I am relatively new to MySQL so I am not sure how to go about figuring out this problem. Searching online hasnt been of much help to me so I was hoping someone here could show me to ropes to profiling this query / figuring out what is causing it to take so long.
If any more information is required to help solve this problem please let me know.
Upvotes: 0
Views: 91
Reputation: 204756
The IN()
clause can be slow when containing many items. Try a JOIN
instead
DELETE T1
FROM Terms t1
JOIN
(
SELECT TermID
FROM TermOccurrences
GROUP BY TermID
HAVING SUM(Counter) <=2
) T2 on T1.TermID = T2.TermID
Upvotes: 1