Michael
Michael

Reputation: 2867

LOAD DATA reclaim disk space after delete

I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables.

I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted

In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.

I am interested if LOAD DATA command also reuses the deleted space?

UPDATE

I am also interested how the index space reclaimed?

UPDATE 2012-12-03 23:11

some more info supplied based on the answer received from @RolandoMySQLDBA

after executing the following suggested query i got different results for different tables for which space need to be reused or reclaimed:

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable1';

> Dynamic

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable2';

> Fixed

UPDATE 2012-12-09 08:06

LOAD DATA do reuses previously deleted space (i have checked it by running a short script) if and only if the row format is fixed or (the row format is dynamic and there is a deleted row with exactly the same size).

it seems that if the row_format is dynamic, full look-up over the deleted list is made for each record , and if the exact row size is not found , the deleted record is not used, and the table memory usage will raise, additionally LOAD DATA will take much more time to import records.

I will except the answer given here , since it describes all the process perfectly.

Upvotes: 4

Views: 3364

Answers (1)

RolandoMySQLDBA
RolandoMySQLDBA

Reputation: 44343

For a MySQL table called mydb.mytable just run the following:

OPTIMIZE TABLE mydb.mytable;

You could also do this in stages:

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;

In either case, the table ends up with no fragmentation.

Give it a Try !!!

UPDATE 2012-12-03 12:50 EDT

If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:

When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either

SHOW CREATE TABLE mydb.mytable\G

or

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';

Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.

The presence of such rows can make myisamchk struggle.

This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.

UPDATE 2012-12-03 12:58 EDT

Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.

You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.

UPDATE 2012-12-03 13:40 EDT

Why don't run the my second sugesstion

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;

This will give you an idea

  • How long OPTIMIZE TABLE would take to run
  • How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE

After you run my second suggestion, you can compare them with

SELECT
    A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
    A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable_new'
) B;

UPDATE 2012-12-03 16:42 EDT

Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.

We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:

  1. The presence of a list of deleted rows, thus choosing from the list
    • Row_Format=Dynamic : list of deleted rows with each row with a different length
    • Row_Format=Fixed : list of deleted rows with all rows the same length
  2. The absence of a list of deleted rows, thus appending
  3. Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)

For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.

Upvotes: 5

Related Questions