Steve F
Steve F

Reputation: 1567

How to improve MySQL DELETE query performance

Using: MySQL 5.6 on Windows, with default configuration file settings in my.ini

Table: datatbl1

row_id   | emailaddr    | valid
--------------------------------  
INT, PK  | VARCHAR(255) | BIT

Both emailaddr and row_id columns each have an index defined.

There are 600,000 rows in the table, and the objective is to remove duplicates. The query is:

delete dt2 from datatbl1 dt1 JOIN datatbl1 dt2 on (dt1.emailaddr = dt2.emailaddr) and (dt1.row_id < dt2.row_id);

On my system, it takes about 15 minutes to complete this query, I watch the mysqld process in the Task Manager, processor usage is 100% the whole time, but memory usage never crosses about 140MB, even though there is about 3GB of memory (RAM) installed and plenty memory available.

Questions:

  1. Can I change some configuration parameters to increase the performance?
  2. Can the query itself be rewritten to increase performance?
  3. Whats a reasonable amount of time to execute this query with say 1 - 2 million rows?

Keep in mind that this query later needs to be applied to other tables, ie delete matches in datatbl1 for records that match other tables (datatbl2, datatbl3, datatbl4 etc) which have the same table structure.

On my client's system, the same query is taking 2 hours. Difference is that he has a normal hard drive, whereas I have a SSD.

The application is a client server application with a Delphi front-end, and is meant to be used by ordinary users on Windows PCs, thus MySQL will be almost always be running on a end-user Windows PC.

Thanks in advance.

EDIT: The Explain output as requested is:

mysql> explain delete dt2 from datatbl1 dt1 JOIN datatbl1 dt2 on (dt1.emailaddr
= dt2.emailaddr) and (dt1.row_id < dt2.row_id);
+----+-------------+-------+-------+------------------------------+-------------
+---------+--------------------------+------+-------------+
| id | select_type | table | type  | possible_keys                | key
| key_len | ref                      | rows | Extra       |
+----+-------------+-------+-------+------------------------------+-------------
+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | dt1   | index | PRIMARY,ixemailaddr,ixrow_id | ixemailaddr
| 257     | NULL                     |    1 | Using index |
|  1 | SIMPLE      | dt2   | ref   | PRIMARY,ixemailaddr,ixrow_id | ixemailaddr
| 257     | emailmgrdb.dt1.emailaddr |    1 | Using where |
+----+-------------+-------+-------+------------------------------+-------------
+---------+--------------------------+------+-------------+
2 rows in set (0.01 sec)

Upvotes: 0

Views: 682

Answers (2)

Barmar
Barmar

Reputation: 780843

Maybe this query will be faster:

DELETE dt1.*
FROM datatbl1 dt1
JOIN (SELECT emailaddr, MIN(row_id) minrow
      FROM datatbl1
      GROUP BY emailaddr) dt2
USING (emailaddr)
WHERE dt1.row_id > dt2.minrow

The size of the intermediate table in your original query is O(n^2) (because it joins each row with all the duplicates that follow it), but this one is O(n) (because it only joins the first row of each set of duplicates with the ones that follow it).

It depends on whether the slowness is in finding the rows or performing all the deletions. You can find this out by doing SELECT instead of DELETE and noting the performance difference.

Upvotes: 1

Aiias
Aiias

Reputation: 4748

Have you tried moving the row_id comparison to the WHERE clause?

DELETE dt1
FROM datatbl1 dt1
INNER JOIN datatbl1 dt2 ON dt1.emailaddr = dt2.emailaddr
WHERE dt1.row_id > dt2.row_id

Upvotes: 0

Related Questions