Doug Wolfgram
Doug Wolfgram

Reputation: 2126

Can i set up Mysql to auto-partition?

I want to partition a very large table. As the business is growing, partitioning by date isn't really that good because each year the partitions get bigger and bigger. What I'd really like is a partition for every 10 million records.

The Mysql manual show this simple example:

CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);

But this means that everything larger than 16 and less than MAXVALUE gets thrown in the last partition. Is there a way to auto-generate a new partition every interval (in my case, 10 million records) so I won't have to keep modifying an active database? I am running Mysql 5.5

Thanks!

EDIT: Here is my actual table

CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
 PRIMARY KEY (`row_id`),
 UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
 KEY `index1` (`filename`),
 KEY `index2` (`timestamp`),
 KEY `index3` (`timestamp`,`filename`,`string`),
 KEY `index4` (`filename`,`unit_num`)
 ) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1

and an example query for the graph is...

SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num 
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) 
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;

Here is the explain for the query...

mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table      | type | possible_keys           | key    | key_len | ref         | rows  | Extra                                              |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
|  1 | SIMPLE      | unit_tarma | ref  | timestamp,index3,index4 | index4 | 58      | const,const | 13440 | Using index condition; Using where; Using filesort | 
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+

Upvotes: 5

Views: 3752

Answers (2)

Rick James
Rick James

Reputation: 142538

(This answer is directed at the schema and SELECT.)

Since you anticipate millions of rows, first I want to point out some improvements to the schema.

  • FLOAT(m,n) is usually the 'wrong' thing to do because it leads to two roundings. Either use plain FLOAT (which seems 'right' for metrics like voltage) or use DECIMAL(m,n). FLOAT is 4 bytes; in the cases given, DECIMAL would be 3 or 4 bytes.

  • When you have both INDEX(a) and INDEX(a,b), the former is unnecessary since the latter can cover for such. You have 3 unnecessary KEYs. This slows down INSERTs.

  • INT(3) -- Are you saying a "3-digit number"? If so consider TINYINT UNSIGNED (values 0..255) for 1 byte instead of INT for 4 bytes. This will save many MB of disk space, hence speed. (See also SMALLINT, etc, and SIGNED or UNSIGNED.)

  • If filename is repeated a lot, you may want to "normalize" it. This would save many MB.

  • Use NOT NULL unless you need NULL for something.

  • AUTO_INCREMENT=690892041 implies that you are about 1/3 of the way to disaster with id, which will top out at about 2 billion. Do you use id for anything? Getting rid of the column would avoid the issue; and change the UNIQUE KEY to PRIMARY KEY. (If you do need id, let's talk further.)

  • ENGINE=MyISAM -- Switching has some ramifications, both favorable and unfavorable. The table would become 2-3 times as big. The 'right' choice of PRIMARY KEY would further speed up this SELECT significantly. (And may or may not slow down other SELECTs.)

A note on the SELECT: Since string and unit_num are constants in the query, the last two fields of ORDER BY timestamp asc, string asc, unit_num asc are unnecessary. If they are relevant for reasons not apparent in the SELECT, then my advice may be incomplete.

This

WHERE filename = 'foobar'
  AND unit_num='40'
  AND string='2' 
  AND timestamp >= ...

is optimally handled by INDEX(filename, unit_name, string, timestamp). The order of the columns is not important except that timestamp needs to be last. Rearranging the current UNIQUE key, you give you the optimal index. (Meanwhile, none of the indexes is very good for this SELECT.) Making it the PRIMARY KEY and the table InnoDB would make it even faster.

Partitioning? No advantage. Not for performance; not for anything else you have mentioned. A common use for partitioning is for purging 'old'. If you intend to do such, let's talk further.

In huge tables it is best to look at all the important SELECTs simultaneously so that we don't speed up one while demolishing the speed of others. It may even turn out that partitioning helps in this kind of tradeoff.

Upvotes: 4

Rick James
Rick James

Reputation: 142538

First, I must ask what benefit Partitioning gives you? Is there some query that runs faster because of it?

There is no auto-partitioning.

Instead, you should have a job that runs every day and it counts the number of rows in the 'last active' partition to see if it is about 10M. If so, add another partition.

I recommend keeping the "last" partition (the one with MAXVALUE) empty. That way you can REORGANIZE PARTITION to split it into two empty partitions with essentially zero overhead. And I recommend that instead of ADD PARTITION because you might slip up and put something in the last partition.

It is unclear what will trigger 10M. Are there multiple rows for each store_id? And are there new rows coming in for each store? If so, then partitioning on store_id since all partitions will be growing all the time.

OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE so we can talk concrete, not hand-waving. There are simply too many ways to take this task.

What is the activity?

If you mostly hit the "recent" partition(s), then an uneven distribution may be warrantied -- periodically add a new partition and combine an adjacent pair of old partitions. (I did this successfully in one system.)

If you will be purging "old" data, then clearly you need to use PARTITION BY RANGE(TO_DAYS(...)) and use DROP PARTITION plus REORGANIZE PARTITION.

And there are lots of other scenarios. But I know of only 4 scenarios where Partitioning provides any performance benefit. See my blog.

Upvotes: 0

Related Questions