How can I optimize my database query without denormalizing?

Question

I've got a percona mysql 5.6.13 database with tables like this:

CREATE TABLE `table1` (
  `table1_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`table1_id`),
  KEY `created_at` (`created_at`)
) ENGINE=InnoDB;

CREATE TABLE `table2` (
  `table1_id` int(10) unsigned NOT NULL,
  `cost` decimal(6,2) NOT NULL DEFAULT '0.00',
  KEY `table1_id` (`table1_id`)
) ENGINE=InnoDB;


CREATE TABLE `table3` (
  `table1_id` int(10) unsigned NOT NULL,
  `partner` enum('partner1', 'partner2', 'partner3', 'partner4') NOT NULL DEFAULT 'partner1',
  KEY `table1_id` (`table1_id`)
) ENGINE=InnoDB;

Each of the tables has about 1.5 million rows in them.

when I run the following query, it takes 18 seconds every time.

SELECT t3.partner, SUM(t2.cost) AS cost FROM table1 t1 JOIN table2 t2 ON t1.table1_id = t2.table1_id JOIN table3 t3 ON t1.table1_id = t3.table1_id WHERE t1.created_at >= '2005-07-01' AND t1.created_at < '2008-09-20' GROUP BY 1;

If I denormalize the cost / partner fields to table1, like so:

ALTER TABLE table1 ADD `cost` decimal(6,2) NOT NULL DEFAULT '0.00', ADD `partner` enum('partner1', 'partner2', 'partner3', 'partner4') NOT NULL DEFAULT 'partner1', ADD KEY `partner` (`partner`);
UPDATE table1 t1 JOIN table2 t2 ON t1.table1_id = t2.table1_id SET t1.cost = t2.cost;
UPDATE table1 t1 JOIN table3 t3 ON t1.table1_id = t3.table1_id SET t1.partner = t3.partner;

and then run this query:

SELECT t1.partner, SUM(t1.cost) AS cost FROM table1 t1 WHERE t1.created_at >= '2005-07-01' AND t1.created_at < '2008-09-20' GROUP BY 1;

It takes 6 seconds the first time, then 2 seconds each time after (because of mysql caching presumably).

I guess what I'm hoping to find is perhaps some way of optimizing / caching the original query without denormalizing the data.
I can't just merge the tables (because of other fields not included in the example, but which I removed for the purposes of testing / being accurate here). I can duplicate the data across the tables, but I'm not a big fan of that, and it seems like there should be a better solution than that.
Any database settings to try?
Perhaps NoSQL with more fully denormalized data -- would aggregation work reasonably fast there in this kind of scenario?
Thanks :)

p.s. one comment asked for the query plan -- the number of rows selected by the where clause is all of them. Same results if I leave off the where, here is the query plan:

+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type  | possible_keys      | key        | key_len | ref                    | rows   | Extra                                                     |
+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+
|  1 | SIMPLE      | t1    | range | PRIMARY,created_at | created_at | 5       | NULL                   | 766380 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t3    | ref   | table1_id,partner  | table1_id  | 4       | lsfs_main.t1.table1_id |      1 | NULL                                                      |
|  1 | SIMPLE      | t2    | ref   | table1_id          | table1_id  | 4       | lsfs_main.t1.table1_id |      1 | NULL                                                      |
+----+-------------+-------+-------+--------------------+------------+---------+------------------------+--------+-----------------------------------------------------------+

Fabian Barney · Accepted Answer

You're missing primary keys for table2and table3. I would suggest at least a multi-column primary key for table3 containing all two columns. Since InnoDB-Tables are index-organized tables this should reduce lookups significantly for table3. With such a primary key MySQL is able to retrieve all relevant data directly from index without further lookups. Field table1_id must be in first position of the multi-column primary key.

For table2 it's not easy, because (table1_id, cost) is not unique.

How can I optimize my database query without denormalizing?

Answers (1)

Related Questions