Reputation: 11
I was trying to check whether implementing MySQL database partitioning is beneficial for our application or not. I have heard a lot about the benefits of using partitioning for large number of records. But surprisingly, the response time of the application got reduced by 3 times when doing the load testing after partitioning was implemented. Could someone please help with the reason why this may happen?
Let me explain in detail:
Below is the DDL of the table when partitioning was ‘not’ in place.
CREATE TABLE `myTable` (
`column1` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`column2` char(3) NOT NULL,
`column3` char(3) NOT NULL,
`column4` char(2) NOT NULL,
`column5` smallint(4) unsigned NOT NULL,
`column6` date NOT NULL,
`column7` varchar(2) NOT NULL,
`column8` tinyint(3) unsigned NOT NULL COMMENT 'Seat Count Ranges from 0-9.',
`column9` varchar(2) NOT NULL,
`column10` varchar(4) NOT NULL,
`column11` char(2) NOT NULL,
`column12` datetime NOT NULL,
`column13` datetime DEFAULT NULL,
PRIMARY KEY (`column1`),
KEY `index1` (`column2`,`column3`,`column4`,`column5`,`column7`,`column6`),
KEY `index2` (`column2`,`column3`,`column6`,`column4`)
) ENGINE=InnoDB AUTO_INCREMENT=342024674 DEFAULT CHARSET=latin1;
And below is the DDL of the same table after implementing ‘Range’ partitioning based on a date field.
CREATE TABLE `myTable` (
`column1` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`column2` char(3) NOT NULL,
`column3` char(3) NOT NULL,
`column4` char(2) NOT NULL,
`column5` smallint(4) unsigned NOT NULL,
`column6` date NOT NULL,
`column7` varchar(2) NOT NULL,
`column8` tinyint(3) unsigned NOT NULL COMMENT 'Seat Count Ranges from 0-9.',
`column9` varchar(2) NOT NULL,
`column10` varchar(4) NOT NULL,
`column11` char(2) NOT NULL,
`column12` datetime NOT NULL,
`column13` datetime DEFAULT NULL,
PRIMARY KEY (`column1`,`column6`),
KEY `index1` (`column2`,`column3`,`column4`,`column5`,`column7`,`column6`),
KEY `index2` (`column2`,`column3`,`column6`,`column4`)
) ENGINE=InnoDB AUTO_INCREMENT=342024674 DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS(`column6`)
(PARTITION date_jul_11 VALUES LESS THAN ('2011-08-01') ENGINE = InnoDB,
PARTITION date_aug_11 VALUES LESS THAN ('2011-09-01') ENGINE = InnoDB,
PARTITION date_sep_11 VALUES LESS THAN ('2011-10-01') ENGINE = InnoDB,
PARTITION date_oct_11 VALUES LESS THAN ('2011-11-01') ENGINE = InnoDB,
PARTITION date_nov_11 VALUES LESS THAN ('2011-12-01') ENGINE = InnoDB,
PARTITION date_dec_11 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB,
PARTITION date_jan_12 VALUES LESS THAN ('2012-02-01') ENGINE = InnoDB,
PARTITION date_feb_12 VALUES LESS THAN ('2012-03-01') ENGINE = InnoDB,
PARTITION date_mar_12 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB,
PARTITION date_apr_12 VALUES LESS THAN ('2012-05-01') ENGINE = InnoDB,
PARTITION date_may_12 VALUES LESS THAN ('2012-06-01') ENGINE = InnoDB,
PARTITION date_jun_12 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB,
PARTITION date_jul_12 VALUES LESS THAN ('2012-08-01') ENGINE = InnoDB,
PARTITION date_aug_12 VALUES LESS THAN ('2012-09-01') ENGINE = InnoDB,
PARTITION date_sep_12 VALUES LESS THAN ('2012-10-01') ENGINE = InnoDB,
PARTITION date_oct_12 VALUES LESS THAN ('2012-11-01') ENGINE = InnoDB,
PARTITION date_nov_12 VALUES LESS THAN ('2012-12-01') ENGINE = InnoDB,
PARTITION date_dec_12 VALUES LESS THAN ('2013-01-01') ENGINE = InnoDB,
PARTITION date_jan_13 VALUES LESS THAN ('2013-02-01') ENGINE = InnoDB,
PARTITION date_feb_13 VALUES LESS THAN ('2013-03-01') ENGINE = InnoDB,
PARTITION date_mar_13 VALUES LESS THAN ('2013-04-01') ENGINE = InnoDB,
PARTITION date_apr_13 VALUES LESS THAN ('2013-05-01') ENGINE = InnoDB,
PARTITION date_may_13 VALUES LESS THAN ('2013-06-01') ENGINE = InnoDB,
PARTITION date_jun_13 VALUES LESS THAN ('2013-07-01') ENGINE = InnoDB,
PARTITION date_oth VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB);
Below is a sample query which was used for doing the load testing to test the performance.
SELECT column8, column9
FROM myTable
WHERE column2 = ? AND column3 = ? AND column4 =? AND column5 = ? AND column7 = ? AND column6 = ?
LIMIT 1
The ?
above were replaced with real values present in the database for testing.
Please note that the number of records in ‘myTable’ table is around 342 million, and the number of test data used for doing the performance testing is about 2 million.
However, as I said, the performance after implementing partitioning was reduced by a shocking 3 times. Any idea what may have caused this?
Also, please let me know if doing any further change in the table structure or indexing may help resolve this issue.
Upvotes: 1
Views: 765
Reputation: 48387
Looking at this, there's several things I would consider.
The first, and most glaring issue is that the big benefit from partitioning comes when you spread your data across different devices (disks) - and there's no evidence of that from the code posted.
Next, your partitioning is hard coded to specific date ranges - hence you're going to have to come up with a better plan when date_oth starts to fill up.
AND column6 = ?
So you only tested the performance of data from single partition? At best this will be no faster than with all the data in one table.
As Nathan points out, you are partitioning by column 6 - but you don't have this at the front of any of your indexes, hence the DBMS must search the index in each partition to find the data - this is ilkely the reason why the performance is so poor. (I disagree that partitioning only helps range queries).
Upvotes: 0
Reputation: 1038
Remember, the goal of partitioning is to speed up queries where your query limits the number of partitions the result could be found in. I think the issues is the column6 = ?
in your test query. I'm guessing that requiring an exact value, rather than a range, for column6 reduces your result set to very few values. Therefore, in the process of narrowing down the partitions, you've already essentially found the result. And since the indexes are split across the multiple partitions, there is a cost to that narrowing process.
The kind of query you would expect to benefit from partitioning on column6 is one that returns a range of values, limited to a small number of partitions. For example, try something like this as a test query:
SELECT column8, column9
FROM myTable
WHERE column6 < ? AND column6 > ? AND column2 = ? AND column3 = ? AND column4 =? AND column5 = ?
where that column6 range spans around 2 partitions, and the total result count is expected to be reasonably large.
This might help: http://dev.mysql.com/tech-resources/articles/partitioning.html
Upvotes: 1