Visinet
Visinet

Reputation: 11

Best practice to delete large number of rows from MySQL Innodb large table

I need to delete a large number of rows, say ~30M, from a large table with say 300M rows, and here is what I have come up with after researching this. Plan is to use a stored procedure to run a loop deleting rows in batches of 10K until it is complete and just let it run overnight. BTW server innodb buffer pool size = 20G

How does this look? Any sugestions, comments or concerns?

A big thanks in advance for any advice on the best way to do this.

DROP PROCEDURE IF EXISTS Mydb.sp_clean_HTOv1;
DELIMITER $$
CREATE PROCEDURE Mydb.sp_clean_HTOv1()
BEGIN
    DECLARE FinishedFlag INT DEFAULT 0;
    REPEAT
        DO SLEEP(1);
        DELETE FROM Mydb.aaahtocopy 
        WHERE
            chindex not like '%.1.1.%' AND          
            ((ipid>=20716 and ipid<=20721)
            OR (ipid=20736 or ipid=20737 or ipid=20738 or ipid=20751 or
            ipid=20806 or ipid=20807 or ipid=20808 or ipid=20821 ))
            LIMIT 10000; 
            if (ROW_COUNT() <= 0) then 
                SET FinishedFlag = 1;
            end if;
        COMMIT; 
    UNTIL FinishedFlag = 1 
END REPEAT;
END$$
DELIMITER ;

Upvotes: 1

Views: 1899

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562310

That solution should work, but it will be very slow. Every iteration of the loop does a scan of the full 300 million row table. If you can let it run overnight, that's fine. But it may be done faster.

What I would do is loop over subsets of the table in primary key order. I assume the table has an integer primary key, like many tables do. Probably the name of the primary key column is id.

Before the loop:

SET low_id = 1;
SET high_id = 10000;
SELECT MAX(id) INTO max_id FROM Mydb.aaahtocopy;

Inside the loop:

    DELETE FROM Mydb.aaahtocopy 
    WHERE id BETWEEN low_id AND high_id
        AND chindex NOT LIKE '%.1.1.%' 
        AND (          
          ipid BETWEEN 20716 AND 20721
          OR ipid IN (20736, 20737, 20738, 20751, 20806, 20807, 20808, 20821)
        );

    SET low_id = low_id + 10000;
    SET high_id = high_id + 10000;

    IF (low_id > max_id) THEN
        SET FinishedFlag = 1;
    END IF;

This way each iteration of the loop only scans at most 10000 rows. Some of these need to be deleted, depending on the other conditions. No LIMIT is necessary, because the condition on id guarantees at most 10000 rows will be matched.

This will examine each subset of the table only once. In some subsets, the conditions will match zero rows, but that's okay. In other subsets, the conditions will match up to 10000 rows.

If you have a different primary key on a different column or multiple columns, then you will have to modify this example solution.

If your table has no primary key, that's a problem. You may have to do it in the inefficient way you proposed. But this demonstrates one benefit of having a primary key.

Upvotes: 2

Related Questions