Reputation: 1670
I've got a percona mysql 5.6.13 database with tables like this:
CREATE TABLE `table1` (
`table1_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
PRIMARY KEY (`table1_id`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB;
CREATE TABLE `table2` (
`table1_id` int(10) unsigned NOT NULL,
`cost` decimal(6,2) NOT NULL DEFAULT '0.00',
KEY `table1_id` (`table1_id`)
) ENGINE=InnoDB;
CREATE TABLE `table3` (
`table1_id` int(10) unsigned NOT NULL,
`partner` enum('partner1', 'partner2', 'partner3', 'partner4') NOT NULL DEFAULT 'partner1',
KEY `table1_id` (`table1_id`)
) ENGINE=InnoDB;
Each of the tables has about 1.5 million rows in them.
when I run the following query, it takes 18 seconds every time.
SELECT t3.partner, SUM(t2.cost) AS cost FROM table1 t1 JOIN table2 t2 ON t1.table1_id = t2.table1_id JOIN table3 t3 ON t1.table1_id = t3.table1_id WHERE t1.created_at >= '2005-07-01' AND t1.created_at < '2008-09-20' GROUP BY 1;
If I denormalize the cost / partner fields to table1, like so:
ALTER TABLE table1 ADD `cost` decimal(6,2) NOT NULL DEFAULT '0.00', ADD `partner` enum('partner1', 'partner2', 'partner3', 'partner4') NOT NULL DEFAULT 'partner1', ADD KEY `partner` (`partner`);
UPDATE table1 t1 JOIN table2 t2 ON t1.table1_id = t2.table1_id SET t1.cost = t2.cost;
UPDATE table1 t1 JOIN table3 t3 ON t1.table1_id = t3.table1_id SET t1.partner = t3.partner;
and then run this query:
SELECT t1.partner, SUM(t1.cost) AS cost FROM table1 t1 WHERE t1.created_at >= '2005-07-01' AND t1.created_at < '2008-09-20' GROUP BY 1;
It takes 6 seconds the first time, then 2 seconds each time after (because of mysql caching presumably).
I guess what I'm hoping to find is perhaps some way of optimizing / caching the original query without denormalizing the data.
I can't just merge the tables (because of other fields not included in the example, but which I removed for the purposes of testing / being accurate here). I can duplicate the data across the tables, but I'm not a big fan of that, and it seems like there should be a better solution than that.
Any database settings to try?
Perhaps NoSQL with more fully denormalized data -- would aggregation work reasonably fast there in this kind of scenario?
Thanks :)
p.s. one comment asked for the query plan -- the number of rows selected by the where clause is all of them. Same results if I leave off the where, here is the query plan:
+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | t1 | range | PRIMARY,created_at | created_at | 5 | NULL | 766380 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | t3 | ref | table1_id,partner | table1_id | 4 | lsfs_main.t1.table1_id | 1 | NULL |
| 1 | SIMPLE | t2 | ref | table1_id | table1_id | 4 | lsfs_main.t1.table1_id | 1 | NULL |
+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+
Upvotes: 3
Views: 185
Reputation: 14549
You're missing primary keys for table2
and table3
. I would suggest at least a multi-column primary key for table3
containing all two columns. Since InnoDB-Tables are index-organized tables this should reduce lookups significantly for table3
. With such a primary key MySQL is able to retrieve all relevant data directly from index without further lookups. Field table1_id
must be in first position of the multi-column primary key.
For table2
it's not easy, because (table1_id, cost)
is not unique.
Upvotes: 1