Niger
Niger

Reputation:

Why the size of MYD file is so high?

While i was creating stress data for a table i found the following files are generated.

-rw-rw---- 1 mysql mysql       8858 Jul 28 06:47 card.frm
-rw-rw---- 1 mysql mysql 7951695624 Jul 29 20:48 card.MYD
-rw-rw---- 1 mysql mysql   51360768 Jul 29 20:57 card.MYI

Actually I inserted 1985968 number of records in this table. But the index file size is unbelievable.

Structure of the table is

create table card(
    company_id int(10),
    emp_number varchar(100),
    card_date varchar(10),
    time_entry text,
    total_ot varchar(15),
    total_per varchar(15),
    leave_taken double,
    total_lop double,
    primary key (company_id,emp_number,card_date),
    index (company_id,card_date)
);

Is there any way to reduce the filesize of the MYD?

Upvotes: 1

Views: 9959

Answers (2)

hobodave
hobodave

Reputation: 29303

Please note that .MYI is your index, and .MYD is your data. The only way to reduce the size of your .MYD is to delete rows or alter your column sizes.

50MB for an index on 2 million rows is not large.

Let's look at the size breakdown of your table:

  • company_id - 4 Bytes
  • emp_number - 101 Bytes
  • card_date - 11 Bytes
  • total_ot - 17 Bytes
  • total_per - 17 Bytes
  • leave_taken - 9 Bytes
  • total_lop - 9 Bytes
  • time_entry - avg(length(time_entry)) + 3 Bytes

This gives us a row length of 172 + time_entry bytes. If time_entry averages out at 100 bytes. You're looking at 272 * 2000000 = 544MB

Of significance to me is the number of VARCHARs. Does employee number need to be a varchar(100), or even a varchar at all? You're duplicating that data in it's entirety in your index on (company_id,emp_number,card_date) as you're indexing the whole column.

You probably don't need a varchar here, and you possibly don't need it included in the primary key.

Do you really need time_entry to be a TEXT field? This is likely the biggest consumer of space in your database.

Why are you using varchar(10) for card date? If you used DATETIME you'd only use 8 Bytes instead of 11, TIMESTAMP would be 4 Bytes, and DATE would be 3 Bytes.

You're also adding 1 Byte for every column that can be NULL.

Also try running ANALYZE/REPAIR/OPTIMIZE TABLE commands as well.

Upvotes: 8

Nelson
Nelson

Reputation: 29716

A lot depends on how big that time_entry text field can be. I'm going to assume it's small, less than 100 bytes. Then you have roughly 4 + 100 + 10 + 100 + 15 + 15 + 8 + 8 = roughly 300 bytes of data per record. You have 2 million records. I'd expect the database to be 600 megabytes. In fact you are showing 8000 megabytes of data in the MYD on disk, or a factor of 12x. Something's not right.

Your best diagnostic tool is show table status. In particular check Avg_row_length and Data_length, they will give you some insight into where the space is going.

If you're using MyISAM tables, you may find that myisamchk will help make the table smaller. This tool particularly helps if you inserted and then deleted a lot of rows from the database. "optimize table" can help too. MyISAM does support read-only compressed tables via myisampack. I'd treat that as a last resort, though.

Upvotes: 3

Related Questions