Wallace
Wallace

Reputation: 651

MySQL Delete with Sub-Query Costs Hundreds of Seconds While Select Query Costs Much Less

I have two tables which are defined as below:

CREATE TABLE `a`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cid` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `data` varchar(1024) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
  `update_time` timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP(0),
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 150 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;

there are 10k rows in a and 200k rows in b, all the data were generated by a random function. Below is a sample:

151 8VE6BU06    8VE6BU06    2019-09-12 23:07:39

Here are three tests,

1. Case 1 costs 2.889s

SELECT cid FROM `a` WHERE a.cid not in (select b.cid from b);

execution plan
1   PRIMARY            a    ALL                 10094   Using where
2   SUBQUERY           b    ALL                 199826      

2. case 2 costs 628.699s

delete from `a` WHERE a.cid not in (select b.cid from b);

execution plan
1   PRIMARY             a   ALL                 10094   Using where
2   DEPENDENT SUBQUERY  b   ALL                 199826  Using where

3. case 3 costs 0.036s

alter table b add index cid(cid);
delete from `a` WHERE a.cid not in (select b.cid from b);

execution plan                           
1   PRIMARY             a   ALL                             10094   Using where
2   DEPENDENT SUBQUERY  b   index_subquery  cid cid 302 func    1   Using index

Question

  1. Why test2 takes so long?
  2. Why test1 uses SUBQUERY while test2 uses DEPENDENT SUBQUERY? What's the reason test1 executes so fast comparing to test2?
  3. Why test3 is so fast comparing to test2?

MySQL Version: 5.6.20 innodb

Row data in table b is like:

151 8VE6BU06    8VE6BU06      2019-09-12 23:07:39 

Upvotes: 2

Views: 478

Answers (3)

Rick James
Rick James

Reputation: 142278

  • 5.5 parses and optimizes DELETE and UPDATE differently (not as good) as SELECT. A later version (5.7? or 8.0) does better.
  • DELETE must create undo records in case of a crash midway through the DELETE; this is quite costly relative to just selecting.
  • Notice how the first two EXPLAINs say ALL and ALL. this implies that "for each row in one table, it scans all the rows of the other table". The third explain is orders of magnitude faster because of the INDEX.
  • In older versions, NOT IN ( SELECT ... ) was usually performed very poorly -- as in complete scans, etc.
  • Consider changing to a "multi-table DELETE" using a LEFT JOIN ... IS NULL instead of the IN approach.
  • If you are timing a 'cold' system, then there is a lot of I/O.
  • If innodb_buffer_pool_size is too small to hold all of b, this would lead to a lot of I/O, hence the long time for query 2.
  • Not knowing the average size of cid and data, it is not practical to discuss the I/O needs further.

Upvotes: 4

John
John

Reputation: 3535

  1. Read is faster than write/delete as @shadow commented.

  2. Refer to https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization.html

  3. Indexing

Indexing will give you great performance.

To illustrate in simple way, index is like page0 of document.

Page0 index
Mysql p1
Oracle p2
...
Postgresql p100

Page1 
Some looooooong Mysql material

Page2 
Some looooooong Oracle material
...

Page100 
Some looooooong Postgresql material

Read with index

It takes relatively short time to find key within page0. For example, after finding postgresql in index and go to page 100, this way is much faster than reading hundreds pages.

Delete with index

You can delete only postgresql in page0 index. You don't have to erase all content in page100.

For more information, please refer to How does database indexing work?

Upvotes: 0

daShier
daShier

Reputation: 2116

Are these truly measured in seconds? That seems very excessive for such simple queries on moderate size tables.

That said, in general a DELETE will take longer than a SELECT for the same data. The database engine needs to assure that any foreign key references are not broken and update indexes etc. This behavior seems normal.

Upvotes: 1

Related Questions