Temp Acc
Temp Acc

Reputation: 1

clean unnecessary data in database table with 100M+ rows

I have a table named Controllers which has records of some devices with unique IMEI number.

Field Type Null Key Default Extra
id int NO PRI NULL auto_increment
device_name varchar(200) NO NULL
device_location text NO NULL
imei_number varchar(15) NO NULL
voltage decimal(10,2) NO 0.00
current decimal(10,2) NO 0.00
temperature decimal(10,2) NO 0.00
fault int NO 0
created_at timestamp NO CURRENT_TIMESTAMP
updated_at timestamp NO CURRENT_TIMESTAMP DEFAULT_GENERATED on update CURRENT_TIMESTAMP

I have another table named Logs which stores logs of each device. The logs are inserted every 10 seconds from each device and now RMSLogs table has 100M+ rows

Field Type Null Key Default Extra
id int NO PRI NULL auto_increment
imei_number varchar(15) NO NULL
voltage decimal(10,2) NO 0.00
current decimal(10,2) NO 0.00
temperature decimal(10,2) NO 0.00
fault tinyint NO 0
created_at timestamp NO CURRENT_TIMESTAMP
updated_at timestamp NO CURRENT_TIMESTAMP DEFAULT_GENERATED on update CURRENT_TIMESTAMP

I want to delete the rows which have a time difference less than 10 minutes with respect to imei_number.

I tried these steps:

STEP-1 Using a simple delete query.

DELETE t1
FROM RMSLogs t1
JOIN RMSLogs t2 ON t1.id = t2.id + 1
WHERE TIMESTAMPDIFF(MINUTE, t2.created_at, t1.created_at) < 10;

When I executed the query, it affected around no more than 8500 rows and took around 2 minutes.

STEP-2 Using a stored procedure.

DELIMITER //

CREATE PROCEDURE DeleteLogsLessThan10Minutes()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE imei_number VARCHAR(255);
    DECLARE curr_timestamp DATETIME;
    DECLARE previous_timestamp DATETIME;
    
    -- Declare cursor to fetch distinct IMEI numbers from RmsController table
    DECLARE cur CURSOR FOR 
        SELECT imei_number FROM RMSController;
    
    -- Declare continue handler for cursor
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    
    OPEN cur;
    
    read_loop: LOOP
        FETCH cur INTO imei_number;
        IF done THEN
            LEAVE read_loop;
        END IF;
        
        -- Temporary table to hold log entries for the current IMEI number
        CREATE TEMPORARY TABLE IF NOT EXISTS Temp_Logs (
            created_at DATETIME
        );
        
        -- Insert log entries for the current IMEI number into the temporary table
        INSERT INTO Temp_Logs (created_at)
        SELECT created_at
        FROM RMSLog
        WHERE imei_number = imei_number
        ORDER BY created_at;
        
        -- Initialize variables for timestamp comparison
        SET previous_timestamp = NULL;
        
        -- Loop through the log entries for the current IMEI number
        WHILE (SELECT COUNT(*) FROM Temp_Logs) > 0 DO
            SELECT MIN(created_at) INTO curr_timestamp FROM Temp_Logs;
            
            IF previous_timestamp IS NOT NULL AND TIMESTAMPDIFF(MINUTE, previous_timestamp, curr_timestamp) < 10 THEN
                -- Delete the log entry if the time difference is less than 10 minutes
                DELETE FROM RMSLog
                WHERE imei_number = imei_number AND created_at = curr_timestamp;
            END IF;
            
            SET previous_timestamp = curr_timestamp;
            
            -- Remove the processed log entry from the temporary table
            DELETE FROM Temp_Logs WHERE created_at = curr_timestamp;
        END WHILE;
        
        -- Drop the temporary table
        DROP TEMPORARY TABLE IF EXISTS Temp_Logs;
        
    END LOOP;
    
    CLOSE cur;
    
END //

DELIMITER ;

When I executed this stored procedure it resulted in a timeout.

Upvotes: 0

Views: 90

Answers (1)

Rick James
Rick James

Reputation: 142208

For purging data, walk through the table using the PRIMARY KEY in chunks of 1000. For each 1000, delete what you can. More: https://mysql.rjweb.org/doc.php/deletebig

For other shrinking of the table,

  • Use smaller datatypes where practical. Surely, no temp will ever bey as high as 99999999.99! That takes 5 bytes. DECIMAL(5,2) takes only 3 bytes.
  • imei_number can probably be DECIMAL(15,0) which takes 7 bytes instead of `VARCHAR(...), which takes 17 bytes.
  • Normalize. I would guess that location is being repeated a lot.
  • Do you really need updated_at? (TIMESTAMP is 5 bytes.)
  • More: https://mysql.rjweb.org/doc.php/mysql_sensor

If you will be saving only the last N days of data, consider Partitioning as an efficient way of deleting old data.

Upvotes: 0

Related Questions