Cbpro Ads
Cbpro Ads

Reputation: 55

MySQL performance tuning for DELETE query

Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.

DELETE FROM storefront_categories 
WHERE userid IN (SELECT userid 
                FROM MASTER 
                where expirydate<'2020-2-4' 
                )

At the same time, this query took only 4.69 seconds only to execute.

DELETE FROM storefront_categories 
WHERE userid NOT IN (SELECT userid FROM MASTER)

The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.

Upvotes: 2

Views: 2177

Answers (4)

Rick James
Rick James

Reputation: 142540

When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.

When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.

When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY.

I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig

As for the query formulation:

  • It is version-dependent; old versions of MySQL did a poor job on some flavors.
  • NOT IN (SELECT ...) and NOT EXISTS tend to be the worst performers.
  • IN (SELECT ...) and/or EXISTS may be better.
  • "Multi-table DELETE is another option. It works like JOIN.
  • (Bottom line: You did not say what version you are running; I can't predict which formulation will be best.)
  • My blog avoids the formulation debate.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

I would advise you to use NOT EXISTS with the correct index:

DELETE sc
    FROM storefront_categories sc
    WHERE NOT EXISTS (SELECT 1
                      FROM master m
                      WHERE m.userid = sc.userid AND
                            m.expirydate < '2020-02-04' 
                     );

The index you want is on master(userid, expirydate). The order of the columns is important. For this version, an index on storefront_categories does not help.

Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.

Upvotes: 1

Yogesh Sharma
Yogesh Sharma

Reputation: 50173

I would try with exists :

DELETE 
FROM storefront_categories 
WHERE EXISTS (SELECT 1 
              FROM MASTER M 
              WHERE M.userid = storefront_categories.userid AND
                    M.expirydate <'2020-02-04'  
              );

Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate).

Upvotes: 1

GMB
GMB

Reputation: 222722

The query looks fine as it is.

I would suggest the following indexes for optimization:

master(expiry_date, userid)
storefront_categories(userid)

The first index is a covering index for the subquery on master: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date in the index, it still needs to look at the table data to fetch the related userid).

The second index lets the database optimize the in operation.

Upvotes: 1

Related Questions