Reputation: 3192
I have been using a method oft seen for measuring table size using a query, e.g.:
INSERT INTO tableRecords (loadTime, dataFromDate, tableName, rowCount, sizeMB)
SELECT NOW(),
SUBDATE(CURDATE(),1),
'table_name',
COUNT(*),
(SELECT ROUND(((data_length + index_length) / 1024 / 1024), 2)
FROM information_schema.TABLES
WHERE table_schema = 'db_name' AND table_name = 'table_name')
FROM table_name
I've been running this daily for some time
However, I notice that often for days at a time the number stays the same, regardless of how many rows are added
Is there a better way to do this so that I can ensure I'm getting the current table size?
I am able to run multiple queries with the script I am using
Aware of the OPTIMIZE TABLE
command but I'm unsure if it's the right thing to use or exactly how/when to use it. Is it necessary to solve my problem? I could see it taking some time to run all of the OPTIMIZE TABLE
commands every day in my case (some large tables)
If that is the right way to go, should I just run this command daily prior to my INSERT
?
Are there any other options?
Upvotes: 0
Views: 179
Reputation: 562330
You should know that the number reported for table size in INFORMATIONS_SCHEMA.TABLES is only an estimate -- it can be off +/- 10% or more. It may also change from time to time, when MySQL updates its statistics.
This may be why the number doesn't seem to change even as you insert lots of rows. You can force the statistics to update with ANALYZE TABLE
.
Using SHOW TABLE STATUS is really just reading the same values in the INFORMATION_SCHEMA, so it has the same issue with precision.
Looking at the physical file size on disk is not accurate either, because the file can normally contain some amount of fragmentation, from past deletions or rows that don't fill up a given space in the file fully. Therefore, the real size of data is quite different from the physical size of the file.
This also means that the "size" of a table could mean different things. Is it the size of real rows of data? The size of the pages that data occupies? The size of the pages if they were defragmented? The size of the physical file on disk?
Why do you need such a precise measure of table size, anyway? Even if you could get a perfectly precise table size measurement, it would become obsolete as soon as you inserted another row. That's why an estimate is sufficient for most purposes.
As for when to use OPTIMIZE TABLE, I recommend to use it after I've done some operation that could cause significant fragmentation, like a series of many DELETE statements. Or periodically to defragment. How frequently depends on the level of activity on the table. A table with low traffic perhaps yearly would be enough. A table with a lot of inserts/updates/deletes maybe every couple of weeks. Doing it daily is likely to be overkill.
You're right that it locks your table and block activity for some time, the larger the table, the longer it needs. You can use pt-online-schema-change to allow your table restructure to happen in the background without blocking activity on the original table.
Also, MySQL 5.6 can do OPTIMIZE TABLE for InnoDB tables as online DDL, without locking.
Re your comment:
InnoDB updates statistics from time to time, but it's not impossible for it to go days between the auto-refresh of statistics. Here are some references that talk about this in more detail:
Upvotes: 2