Reputation: 2917
I am trying to optimize a query that currently takes 0.00x s on a MySQL 5.x DB in order to retrieve data on a system without load.
The query looks like this:
SELECT
a.article_id,
GROUP_CONCAT(attr_f.attr_de) AS functions,
GROUP_CONCAT(attr_n.attr_de) AS miscellaneous
FROM `articles_test` a
LEFT JOIN articles_attr AS f ON a.article_id = f.article_id AND f.attr_group_id = 26
LEFT JOIN articles_attr AS attr ON a.article_id = attr.article_id AND attr.attr_group_id = 27
LEFT JOIN cat_attr AS attr_f ON attr_f.attr_id = f.attr_id
LEFT JOIN cat_attr AS attr_n ON attr_n.attr_id = attr.attr_id
WHERE a.article_id = 11
EXPLAIN returns
1 SIMPLE a
NULL
const article_id article_id 3 const 1 100.00
NULL
1 SIMPLE f
NULL
ref article_id_2,article_id article_id_2 6 const,const 2 100.00 Using index
1 SIMPLE attr
NULL
ref article_id_2,article_id article_id_2 6 const,const 4 100.00 Using index
1 SIMPLE attr_f
NULL
ref attr_id attr_id 3 test.f.attr_id 1 100.00
NULL
1 SIMPLE attr_n
NULL
ref attr_id attr_id 3 test.attr.attr_id 1 100.00
NULL
There are indexes on all fields that are queried. Is there another way to retrieve the data with a simpler and faster query?
CREATE TABLE `articles_attr` (
`date_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`article_id` mediumint(8) unsigned NOT NULL,
`attr_group_id` mediumint(8) NOT NULL,
`attr_id` mediumint(8) unsigned DEFAULT NULL,
`value` varchar(255) DEFAULT NULL,
UNIQUE KEY `article_id_2` (`article_id`,`attr_group_id`,`attr_id`),
KEY `article_id` (`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `cat_attr` (
`attr_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`attr_group_id` mediumint(8) unsigned NOT NULL,
`sort` tinyint(4) NOT NULL,
`attr_de` varchar(255) NOT NULL,
UNIQUE KEY `attr_id` (`attr_id`,`attr_group_id`),
UNIQUE KEY `attr_group_id` (`attr_group_id`,`attr_de`)
) ENGINE=InnoDB AUTO_INCREMENT=380 DEFAULT CHARSET=utf8
CREATE TABLE `articles_test` (
`article_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
UNIQUE KEY `article_id` (`article_id`),
) ENGINE=InnoDB AUTO_INCREMENT=221614 DEFAULT CHARSET=latin1
Table articles_attr contains about 0.5 million rows.
Upvotes: 0
Views: 69
Reputation: 142298
`attr_id` mediumint(8) unsigned DEFAULT NULL,
Why NULL
? Don't you always need an attr? The reason I am bringing this up is that you do not have an explicit PRIMARY KEY
on articles_attr
. The NULL
prevents promoting the UNIQUE
key to PK. Change to NOT NULL
and promote the UNIQUE
to PK.
KEY `article_id` (`article_id`)
Redundant, Drop it.
The structure of the many:many tables is sub-optimal. Several tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
If you don't need "many:many", switch to "1:many"; it is more efficient.
You can probably use JOIN
instead of LEFT JOIN
since you need to get all the way to attr_f
and attr_n
.
Moving the Joins for the Group_concats into the SELECT
may help:
SELECT a.article_id,
(
SELECT GROUP_CONCAT(ca.attr_de)
FROM articles_attr AS aa
JOIN cat_attr AS ca USING(attr_id)
WHERE aa.attr_group_id = 26
AND aa.article_id = a.article_id
) AS functions,
(
SELECT GROUP_CONCAT(attr_f.attr_de)
FROM ..
JOIN ..
WHERE ..
) AS miscellaneous
FROM `articles_test` a
WHERE a.article_id = 11
But perhaps the most important is to avoid making the already-bad EAV schema design worse by normalizing the attributes!. That is, get rid of the table cat_attr
, and move attr_de
into articles_attr
. This will cut in half the number of JOINs
.
Upvotes: 0
Reputation: 350252
Since your WHERE
clause specifies the value of article_id
, there is no real need to let the select
clause return it. It is better to remove it, also because it does not comply with SQL standards that say that if you have an aggregation (group_concat
) all non-aggregating expressions in the select
clause must be in the group by
clause. But doing that (as in your question's first version) would give some overhead. So better remove it.
As the WHERE
condition is on the primary key and you don't need any data from the articles_test
table, you could omit the articles_test
table, and put the WHERE
condition on the foreign keys instead.
Finally, there is a kind of Cartesian join, as you combine every hit in attr_f
with every hit in attr_n
. This could lead to some duplicates in the group_concat
outputs and represents a performance hit.
If it is OK to have such duplicates removed, then maybe you would have a better performance by splitting the query into to groups: one for the function output, one for the miscellaneous output. The group is then formed by attr_group_id
.
This will also allow for turning the outer joins into inner joins.
So the output would be the unpivoted version of what you were after:
SELECT attr.attr_group_id, GROUP_CONCAT(cat.attr_de) AS functions
FROM articles_attr AS attr
INNER JOIN cat_attr AS cat ON cat.attr_id = attr.attr_id
WHERE attr.article_id = 11
AND attr.attr_group_id IN (26, 27)
GROUP BY attr.attr_group_id
So now the output will have two rows. The one with 26 in the first column will list the functions in the second column, and the one with 27 in the first column will list the miscellaneous.
True, the output format is different, but I think you will be able to rework the code that uses this query, while benefiting from the performance increase (which I expect).
If you need the pivoted version, then use a case when
expression:
SELECT GROUP_CONCAT(CASE attr.attr_group_id WHEN 26 THEN cat.attr_de END) AS functions,
GROUP_CONCAT(CASE attr.attr_group_id WHEN 27 THEN cat.attr_de END) AS miscellaneous
FROM articles_attr AS attr
INNER JOIN cat_attr AS cat ON cat.attr_id = attr.attr_id
WHERE attr.article_id = 11
AND attr.attr_group_id IN (26, 27)
Upvotes: 1
Reputation: 108651
First of all, 9ms for a query like this isn't bad. There's no radical improvement to be had. You may be able to squeeze another millisecond or two from the query, and you may not.
Your three-column index on articles_attr
looks OK. You might try switching the order of the first two columns in the index to see if you get better performance.
As it is, your single-column index on that table is unnecessary: that indexing functionality is provided because the same column comes first in your three-column index. Dropping that index probably won't help your query performance, but it will help insert performance.
GROUP_CONCAT()
makes sense here. It's perfectly valid to aggregate an entire result set. You might add GROUP BY a.article_id
just for clarity; it won't make any performance difference because you already choose only a single value of that column.
On cat_attr
, a compound index on (attr_id, attr_de)
may help. But that's apparently a small table, so it won't help too much.
Do you need the LEFT JOIN
operations to join articles_attr
to cat_attr
? Or, by the structure of your data, is every value of articles_attr.attr_id
guaranteed to find a match in cat_attr.attr_id
. If you can change those LEFT JOIN
operations to JOIN
s you may get a slight speedup.
Upvotes: 0