Reputation: 17443
I have three tables, bug
, bugrule
and bugtrace
, for which relationships are:
bug 1--------N bugrule
id = bugid
bugrule 0---------N bugtrace
id = ruleid
Because I'm almost always interested in relations between bug <---> bugtrace
I have created an appropriate VIEW
which is used as part of several queries. Interestingly, queries using this VIEW
have significantly worse performance than equivalent queries using the underlying JOIN
explicitly.
VIEW
definition:
CREATE VIEW bugtracev AS
SELECT t.*, r.bugid
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
Execution plan for a query using the VIEW
(bad performance):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id)) FROM bugtracev AS t
WHERE t.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | t | index | NULL | ruleid | 9 | NULL | 1426004 | Using index |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id | id_2 | 8 | bugapp.t.ruleid | 1 | Using where |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
3 rows in set (0.00 sec)
Execution plan for a query using the underlying JOIN
directly (good performance):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id))
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
AND r.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id,bugid | bugid | 8 | bugapp.c.id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t | ref | ruleid | ruleid | 9 | bugapp.r.id | 713002 | Using index |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
3 rows in set (0.00 sec)
CREATE TABLE
statements (reduced by irrelevant columns) are:
mysql> show create table bug;
CREATE TABLE `bug` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`state` varchar(16) DEFAULT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugrule;
CREATE TABLE `bugrule` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`bugid` bigint(20) NOT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`),
KEY `bugid` (`bugid`),
CONSTRAINT `bugrule_ibfk_1` FOREIGN KEY (`bugid`) REFERENCES `bug` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugtrace;
CREATE TABLE `bugtrace` (
`id` bigint(20) NOT NULL,
`ruleid` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ruleid` (`ruleid`),
CONSTRAINT `bugtrace_ibfk_1` FOREIGN KEY (`ruleid`) REFERENCES `bugrule` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Upvotes: 3
Views: 139
Reputation: 17443
When using MySQL 5.6 (or older), try with at least MySQL 5.7. According to What’s New in MySQL 5.7?:
We have to a large extent unified the handling of derived tables and views. Until now, subqueries in the FROM clause (derived tables) were unconditionally materialized, while views created from the same query expressions were sometimes materialized and sometimes merged into the outer query. This behavior, beside being inconsistent, can lead to a serious performance penalty.
Upvotes: 0
Reputation: 108651
You ask why about query optimization for a couple of complex queries with COUNT(DISTINCT val)
and dependent subqueries. It's hard to know why for sure.
You will probably fix most of your performance problem by getting rid of your dependent subquery, though. Try something like this:
SELECT c.id,state, cnt.cnt
FROM bug AS c
LEFT JOIN (
SELECT bugid, COUNT(DISTINCT id) cnt
FROM bugtracev
GROUP BY bugid
) cnt ON c.id = cnt.bugid
WHERE c.version IS NULL
AND c.id<10;
Why does this help? To satisfy the query the optimizer can choose to run the GROUP BY
subquery just once, rather than many times. And, you can use EXPLAIN
on the GROUP BY
subquery to understand its performance.
You may also get a performance boost by creating a compound index on bugrule
that matches the query in your view. Try this one.
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
and try switching the last two columns like so
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
These indexes are called covering indexes because they contain all the columns needed to satisfy your query. version
appears first because that helps optimize WHERE version IS NULL
in your view definition. That makes it faster.
Pro tip: Avoid using SELECT *
in views and queries, especially when you have performance problems. Instead, list the columns you actually need. The *
may force the query optimizer to avoid a covering index, even when the index would help.
Upvotes: 1