Reputation: 8376
I wanted to delete lots of rows from medium size (700K) table, based on primary key. Thought, the best way should use SELECT
-subquery for DELETE
source list. And found specific answer here too. Problem is: it is so much slower than using two separate queries (first select IDs and then delete those IDs from table). Why is that so?
I made simple test case too:
CREATE TABLE `xyz` (
`xyzID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`xyzID`)
) ENGINE=InnoDB;
Populated it with million records, and then:
DELETE FROM xyz
WHERE xyzID IN
(
SELECT xyzID
FROM
(
SELECT xyzID
FROM xyz
LIMIT 3000,1000
) a
);
Query OK, 1000 rows affected (53.52 sec)
Deleting 2000 rows doubles time:
Query OK, 2000 rows affected (1 min 48.25 sec)
But deleting without subquery (made select first) took almost no time (id-list generated by random, here):
DELETE FROM test.xyz WHERE xyzID IN ( 660422,232794,573802,....
Query OK, 996 rows affected (0.04 sec)
Why is deleting with subquery so slow?
Upvotes: 2
Views: 2009
Reputation: 16296
The first step to solving this problem is to select the ids you want to delete into a temporary table. However, you might still run into the slow subquery problem when you try to actually do the delete.
The solution to that is to use DELETE xyz FROM xyz INNER JOIN xyz_temp WHERE xyz.id = xyz_temp.id
syntax, which achieves the same thing and runs as fast as a simple join.
Upvotes: 0
Reputation: 20889
If you read the documentation on subqueries, you will find some things that might be the cause for this: https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html
The optimizer will rewrite your uncorrelated WHERE IN (Subquery)
statements to correlated statements using exists
.
So, your query might actually be executed like this:
DELETE FROM xyz t1
WHERE EXISTS (
(
SELECT 1
FROM
(
SELECT xyzID t3
FROM xyz
LIMIT 3000,1000
) a
where t1.xyzID = a.xyzID
);
The correlated subquery now needs to be executed Everytime a single row is deleted.
So: For 1000 deletions, you will run 1000 subqueries on the temporary table a
.
Only the inner query will remain uncorrelated.
Compared to in(valuelist)
you are running 1001
queries rather than 1
.
docu:
An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.
Upvotes: 5
Reputation: 518
Subqueries means you are asking your db engine to compare all the "N" rows in the first table with all the "M" rows in another table you are creating in that moment. That's means you have N*M compare operation and to do it, you need to join the tables. The table you are building have N * M rows.
Without subquery you are just comparing all the "N" row in your table with "X" keywords where "X" << "M".
Upvotes: -1