Reputation: 3089
I have a table which will grow large over time, moreover I need only small amount of data say last 7 days.
I want to configure it such that the data of 7 days goes in one partition, and then in next. This way I would keep only two partitions and archive others.
I read about MySQL partitions here but the way in article to create partitions is that we specify all partitions while creating table only.
I am not sure is this the best way to do it where we add partitioning logic for long time.
Any ideas?
Upvotes: 2
Views: 2711
Reputation: 108420
There is no "automatic" partition management in MySQL. We have to run some specific SQL statements to add and drop partitions from a partitioned table.
We automated the task with a cron job which runs a MySQL PROCEDURE we wrote to drop (swap out) old partitions, and another PROCEDURE to add new partitions. The procedures are specific to a particular table.
Our table is partitioned by RANGE
on a TIMESTAMP
column. The partition expression is like UNIX_TIMESTAMP(col)
.
To add a new partition, we reorganize the MAXVALUE
partition, which is always (or should always be) empty, so the operation is very quick. We dynamically prepare and execute a statement of the form:
ALTER TABLE ourtable REORGANIZE PARTITION pmax
INTO ( PARTITION pn_name VALUES LESS THAN (UNIX_TIMESTAMP(pn_date))
, PARTITION pmax VALUES LESS THAN MAXVALUE)
To get a new date value for the new partition (pn_name), we take the partition_description
value from the second to last partition (the last partition is the MAXVALUE partition), and add 7 days to it to get the pn_date
string to use. We use that same value to generate the pn_name for the new partition. (We name the partitions following a pattern like this: p20161030
based on the date value in the partition_description e.g. UNIX_TIMESTAMP('2016-10-30')
.
(This information is obtained from a fairly involved query with a couple of references to information_schema.partitions
view.
With the other procedure to drop old partitions, we actually "swap out" the old partition to an archive table. (The archive table is later backed up, and dropped by a different task.)
The procedure basically runs a series of statements like this:
DROP TABLE IF EXISTS `_et` ;
CREATE TABLE `_et` LIKE `rdg_point_value` ;
ALTER TABLE `_et` REMOVE PARTITIONING ;
ALTER TABLE `ourtable` EXCHANGE PARTITION `oldest_partition` WITH TABLE `_et` ;
ALTER TABLE `ourtable` DROP PARTITION `oldest_partition` ;
RENAME TABLE `et` TO `archive_oldest_partition` ;
(I wish there was a cleaner way to create a new un-partitioned table, in a single statement, such as a a CREATE TABLE ... LIKE ... WITHOUT PARTITIONING
, but absent that, we settled on the two separate statements.)
Just dropping the oldest partition would be a simpler process.
To obtain information about the oldest partition, our query is probably overkill. But it's where most of the "magic" happens. Just to give you an idea of what that query looks like...
FROM information_schema.partitions p1
JOIN information_schema.partitions px
ON px.table_schema = 'ourdatabase'
AND px.table_name = 'ourtable'
AND px.partition_method = 'RANGE'
AND px.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND px.partition_description = 'MAXVALUE'
WHERE p1.table_schema = 'ourdatabase'
AND p1.table_name = 'ourtable'
AND p1.partition_method = 'RANGE'
AND p1.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND p1.partition_description <> 'MAXVALUE'
AND p1.partition_description + 0 <= UNIX_TIMESTAMP(DATE(NOW()) + INTERVAL -187 DAY)
AND p1.partition_ordinal_position = 1
You could probably get away with a simpler query. (Our query is designed to only return the "oldest" partition only if all of the timestamp values in it are at least six months old, and only if there is a MAXVALUE partition defined.
Each of the procedures use the current date to see if "its time" to add or drop a partition. (The amount of time forward and back is hardcoded into the queries in the procedure... the query returns 0 rows if its not time yet.
The procedures only need to be executed once per week, and we designed them so that any "extra" runs won't add or drop partitions outside of the specified time ranges.
We have the procedures scheduled to execute every day, and on most days, the procedure runs a query which returns zero rows, and exits. Only when the query returns a row is there any work to do.
Upvotes: 3
Reputation: 2617
Unfortunately, it'll be a fairly manual process. Your best bet is to create the partitions, week by week ahead of time, then have a job that runs periodically to archive the old data into the 'catchall' partition.
e.g. with:
PARTITION BY RANGE ( TO_DAYS(date) ) (
PARTITION pmin VALUES LESS THAN ( TO_DAYS('2016-10-02 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( TO_DAYS('2016-10-09 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( TO_DAYS('2016-10-16 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( TO_DAYS('2016-10-23 00:00:00') ),
PARTITION pmax VALUES LESS THAN (MAXVALUE)
);
There's no real harm having a few empty partitions sitting there with higher dates then doing a 'shift' once a week. It'll be fast enough as long as when you change the partitioning definition, the data window shifts by the partition size.
Your job would do something like
ALTER TABLE x REORGANIZE PARTITION pmin,p1 INTO (
PARTITION pmin VALUES LESS THAN ('2016-10-09 00:00:00')
);
ALTER TABLE x
ADD PARTITION px VALUES LESS THAN ( TO_DAYS('2016-10-30 00:00:00') )
);
Upvotes: 2