Reputation: 55
Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.
DELETE FROM storefront_categories
WHERE userid IN (SELECT userid
FROM MASTER
where expirydate<'2020-2-4'
)
At the same time, this query took only 4.69 seconds only to execute.
DELETE FROM storefront_categories
WHERE userid NOT IN (SELECT userid FROM MASTER)
The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.
Upvotes: 2
Views: 2177
Reputation: 142540
When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.
When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.
When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY
.
I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig
As for the query formulation:
NOT IN (SELECT ...)
and NOT EXISTS
tend to be the worst performers.IN (SELECT ...)
and/or EXISTS
may be better.DELETE
is another option. It works like JOIN
.Upvotes: 2
Reputation: 1271151
I would advise you to use NOT EXISTS
with the correct index:
DELETE sc
FROM storefront_categories sc
WHERE NOT EXISTS (SELECT 1
FROM master m
WHERE m.userid = sc.userid AND
m.expirydate < '2020-02-04'
);
The index you want is on master(userid, expirydate)
. The order of the columns is important. For this version, an index on storefront_categories
does not help.
Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.
Upvotes: 1
Reputation: 50173
I would try with exists
:
DELETE
FROM storefront_categories
WHERE EXISTS (SELECT 1
FROM MASTER M
WHERE M.userid = storefront_categories.userid AND
M.expirydate <'2020-02-04'
);
Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate)
.
Upvotes: 1
Reputation: 222722
The query looks fine as it is.
I would suggest the following indexes for optimization:
master(expiry_date, userid)
storefront_categories(userid)
The first index is a covering index for the subquery on master
: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date
in the index, it still needs to look at the table data to fetch the related userid
).
The second index lets the database optimize the in
operation.
Upvotes: 1