w.k
w.k

Reputation: 8376

Why DELETE with subquery is much slower than with simple list of IDs?

I wanted to delete lots of rows from medium size (700K) table, based on primary key. Thought, the best way should use SELECT-subquery for DELETE source list. And found specific answer here too. Problem is: it is so much slower than using two separate queries (first select IDs and then delete those IDs from table). Why is that so?

I made simple test case too:

CREATE TABLE `xyz` (
  `xyzID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`xyzID`)
) ENGINE=InnoDB;

Populated it with million records, and then:

DELETE FROM xyz
WHERE xyzID IN
        (
        SELECT xyzID
        FROM
            (
                SELECT xyzID
                FROM xyz
                LIMIT 3000,1000
            ) a
        );
Query OK, 1000 rows affected (53.52 sec)

Deleting 2000 rows doubles time:

Query OK, 2000 rows affected (1 min 48.25 sec)

But deleting without subquery (made select first) took almost no time (id-list generated by random, here):

DELETE FROM test.xyz WHERE xyzID IN ( 660422,232794,573802,....
Query OK, 996 rows affected (0.04 sec)

Why is deleting with subquery so slow?

Upvotes: 2

Views: 2009

Answers (3)

grahamparks
grahamparks

Reputation: 16296

The first step to solving this problem is to select the ids you want to delete into a temporary table. However, you might still run into the slow subquery problem when you try to actually do the delete.

The solution to that is to use DELETE xyz FROM xyz INNER JOIN xyz_temp WHERE xyz.id = xyz_temp.id syntax, which achieves the same thing and runs as fast as a simple join.

Upvotes: 0

dognose
dognose

Reputation: 20889

If you read the documentation on subqueries, you will find some things that might be the cause for this: https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html

The optimizer will rewrite your uncorrelated WHERE IN (Subquery) statements to correlated statements using exists.

So, your query might actually be executed like this:

DELETE FROM xyz t1
WHERE EXISTS (
    (
    SELECT 1
    FROM
        (
            SELECT xyzID t3
            FROM xyz
            LIMIT 3000,1000
        ) a
    where t1.xyzID = a.xyzID
    );

The correlated subquery now needs to be executed Everytime a single row is deleted.

So: For 1000 deletions, you will run 1000 subqueries on the temporary table a. Only the inner query will remain uncorrelated.

Compared to in(valuelist) you are running 1001 queries rather than 1.

docu:

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

Upvotes: 5

Luca Giardina
Luca Giardina

Reputation: 518

Subqueries means you are asking your db engine to compare all the "N" rows in the first table with all the "M" rows in another table you are creating in that moment. That's means you have N*M compare operation and to do it, you need to join the tables. The table you are building have N * M rows.

Without subquery you are just comparing all the "N" row in your table with "X" keywords where "X" << "M".

Upvotes: -1

Related Questions