Reputation: 9148
This query needs to delete over 17 million rows, from a table containing 20 million.
DELETE
FROM statements
WHERE agreement_id IN
(SELECT id
FROM agreements
WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH));
DELETE
FROM agreements
WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
It takes hours to run, am I missing something that could speed things up a bit?
The subselect by itself takes a few seconds, I don't understand why the delete takes so long.
Upvotes: 1
Views: 164
Reputation: 142540
Use a "multi-table delete" instead of the usually inefficient IN ( SELECT ... )
.
Several techniques for large deletes are discussed here.
To delete 85% of the table, it is really best to build a new table with the 15% you are keeping, then swap the table into place. (More on that in the link above.)
Upvotes: 1
Reputation: 37507
Try to rewrite the first statement to use EXISTS
.
DELETE FROM statements
WHERE EXISTS (SELECT *
FROM agreements
WHERE agreements.id = statements.aggreement_id
AND agreements.created < date_sub(curdate(), interval 6 month));
And put an index on agreements (id, created)
(if not already there).
CREATE INDEX agreements_id_created
ON agreements
(id,
created);
For the second one create an index on agreements (created)
(if not already there).
CREATE INDEX agreements_created
ON agreements
(created);
Upvotes: 1
Reputation: 30663
If you have this much delete to be undertaken. I suggest you to:
or
Also for your query,
never use IN clause for BIG data. Instead use exists which is more performant.
Basic script:
CREATE TABLE tmp_statements as
SELECT * FROM statements s where exists
(
select 1 FROM agreements a
WHERE
created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH AND
s.agreement_id = a.agreement_id
));
DROP TABLE statements;
RENAME TABLE tmp_statements TO statements ;
--DONT FORGET TO RECREATE CREATE YOUR INDEXES, CONSTRAINTS;
Upvotes: 1