Reputation: 10755
I am trying to delete data from table which contains almost 6,000,000,000 records , with where clause.
here is the stored procedure I am using and running from command prompt MySQL in windows.
DELIMITER $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `clean_table`( )
BEGIN
REPEAT
DO SLEEP(1);
DELETE FROM tablename
WHERE exportInfoId<=8479 limit 100000;
SELECT ROW_COUNT();
UNTIL ROW_COUNT() = 0 END REPEAT;
END$$
DELIMITER ;
Its deleting the data but it continue incasing the time for each delete transaction. Why its keep increasing , even when with delete the earlier transaction reduce the data in table ? Is there any way to make it same time for each transaction ? I am already using sleep as suggested in some other answers .
Upvotes: 1
Views: 1204
Reputation: 1672
You need to add an index to the column with which you are using to find the record(s) to be deleted.
With an index, MySQL knows exactly where the records are to be found so it can go straight to the record(s).
Without an index, then the table must be searched row by row.
The difference is, without an index the deletes are performed in the order of the primary key however, with an index the records will be searched in the order of that particular column.
To add an index, do what @o-jones points out below in the comments, which is:
// Normal Index
ALTER TABLE ADD INDEX exportInfoId (exportInfoId);
// Reverse Index
ALTER TABLE ADD INDEX exportInfoId (exportInfoId DESC);
Adding an index to the column is the correct answer here however, there may be other answers that work for you, depending on your use-case.
Upvotes: 2
Reputation: 142258
Sure, adding an index would help a lot. ("DESC" is optional.)
Sure it will take time to add an index to 6B rows. Newer versions of MySQL can do it "instantly", but that just means it is happening in the background and is not available for a long time.
The reason for the slowdown is that the DELETEs
are skipping over more and more rows that are not to be deleted. The INDEX
would avoid that. (As Akina described.)
A note about DELETE
(and UPDATE
): It is slow because it keeps the deleted rows in case of a ROLLBACK
(which would happen after a power failure).
There are other techniques. See http://mysql.rjweb.org/doc.php/deletebig . The main one that comes to mind for your task is to walk through the table 1K rows at a time based on the PRIMARY KEY
. (Note that this avoids repeatedly skipping over rejected rows.)
If "most" of the table is to be deleted, copying the table over is the best way. (See that link.) Note also that the table will not shrink after DELETEs
; the freed up space will be reused, to some extent. (As Bill Commented on.)
If you are deleting "old" data such that PARTITIONing
would work, then DROP PARTITION
is virtually instantaneous on all versions (since 5.1?) of MySQL. (That is also mentioned in my link.) (Partitioning is usually used for "time-series"; does exportInfoId
work like a "time"?)
To Alex, I say "there may be other answers".
As for 100K vs 5K rows; I like 1K because above that, you get into "diminishing returns". And at 100K you might hit various buffer size settings that actually cause 100K to run slower.
Upvotes: 1