Michael Aquilina
Michael Aquilina

Reputation: 5520

DELETE query slow on MySql

I have an SQL Query whose purpose is to "prune" items in a table which have an occurrence of less than 2. The table is quite large, and the number of items to prune is also quite large (~21000 rows).

The delete operation I am running is as follows:

DELETE FROM Terms
WHERE TermID IN (
    SELECT TermID
    FROM TermOccurrences
    GROUP BY TermID
    HAVING SUM(Counter) <=2
);

The query is so slow, that I have yet to see it complete (and I have tried to rerun it numerous times). I should also note, that my CPU/Disk Monitor shows high CPU usage on one core but very little I/O on disk

The Inner SELECT Query is fast if I run it independently, so I do not think that is the bottleneck. Here is an EXPLAIN of the Inner Query:

+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table           | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | TermOccurrences | ALL  | NULL          | NULL | NULL    | NULL | 113135 | Using temporary; Using filesort |
+----+-------------+-----------------+------+---------------+------+---------+------+--------+---------------------------------+

I should also note that the TermOccurrences table has foreign key constraint on Terms.TermID and it is set to cascade delete.

The Terms table has a primary key index on TermID and TermOccurrences has a composite primary key index on two columns (TermID, PageID).

The tables are all using the InnoDB engine (but I have also tried MyISAM with no difference).

I am relatively new to MySQL so I am not sure how to go about figuring out this problem. Searching online hasnt been of much help to me so I was hoping someone here could show me to ropes to profiling this query / figuring out what is causing it to take so long.

If any more information is required to help solve this problem please let me know.

Upvotes: 0

Views: 91

Answers (1)

juergen d
juergen d

Reputation: 204756

The IN() clause can be slow when containing many items. Try a JOIN instead

DELETE T1
FROM Terms t1
JOIN 
(
    SELECT TermID
    FROM TermOccurrences
    GROUP BY TermID
    HAVING SUM(Counter) <=2
) T2 on T1.TermID = T2.TermID

Upvotes: 1

Related Questions