merlin
merlin

Reputation: 2917

How to optimize query on normalized database structure?

I am trying to optimize a query that currently takes 0.00x s on a MySQL 5.x DB in order to retrieve data on a system without load.

The query looks like this:

SELECT 
   a.article_id,
   GROUP_CONCAT(attr_f.attr_de) AS functions, 
   GROUP_CONCAT(attr_n.attr_de) AS miscellaneous
FROM `articles_test` a
LEFT JOIN articles_attr AS f ON a.article_id = f.article_id AND f.attr_group_id = 26
LEFT JOIN articles_attr AS attr ON a.article_id = attr.article_id AND attr.attr_group_id = 27
LEFT JOIN cat_attr AS attr_f ON attr_f.attr_id = f.attr_id
LEFT JOIN cat_attr AS attr_n ON attr_n.attr_id = attr.attr_id
WHERE a.article_id = 11

EXPLAIN returns

1   SIMPLE  a   
    NULL
    const   article_id  article_id  3   const   1   100.00  
    NULL

1   SIMPLE  f   
    NULL
    ref article_id_2,article_id article_id_2    6   const,const 2   100.00  Using index 
1   SIMPLE  attr    
    NULL
    ref article_id_2,article_id article_id_2    6   const,const 4   100.00  Using index 
1   SIMPLE  attr_f  
    NULL
    ref attr_id attr_id 3   test.f.attr_id  1   100.00  
    NULL

1   SIMPLE  attr_n  
    NULL
    ref attr_id attr_id 3   test.attr.attr_id   1   100.00  
    NULL

There are indexes on all fields that are queried. Is there another way to retrieve the data with a simpler and faster query?

CREATE TABLE `articles_attr` (
 `date_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `article_id` mediumint(8) unsigned NOT NULL,
 `attr_group_id` mediumint(8) NOT NULL,
 `attr_id` mediumint(8) unsigned DEFAULT NULL,
 `value` varchar(255) DEFAULT NULL,
 UNIQUE KEY `article_id_2` (`article_id`,`attr_group_id`,`attr_id`),
 KEY `article_id` (`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

CREATE TABLE `cat_attr` (
 `attr_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
 `attr_group_id` mediumint(8) unsigned NOT NULL,
 `sort` tinyint(4) NOT NULL,
 `attr_de` varchar(255) NOT NULL,
 UNIQUE KEY `attr_id` (`attr_id`,`attr_group_id`),
 UNIQUE KEY `attr_group_id` (`attr_group_id`,`attr_de`)
) ENGINE=InnoDB AUTO_INCREMENT=380 DEFAULT CHARSET=utf8

CREATE TABLE `articles_test` (
 `article_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
 UNIQUE KEY `article_id` (`article_id`),
) ENGINE=InnoDB AUTO_INCREMENT=221614 DEFAULT CHARSET=latin1

Table articles_attr contains about 0.5 million rows.

Upvotes: 0

Views: 69

Answers (3)

Rick James
Rick James

Reputation: 142298

`attr_id` mediumint(8) unsigned DEFAULT NULL,

Why NULL? Don't you always need an attr? The reason I am bringing this up is that you do not have an explicit PRIMARY KEY on articles_attr. The NULL prevents promoting the UNIQUE key to PK. Change to NOT NULL and promote the UNIQUE to PK.

KEY `article_id` (`article_id`)

Redundant, Drop it.

The structure of the many:many tables is sub-optimal. Several tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

If you don't need "many:many", switch to "1:many"; it is more efficient.

You can probably use JOIN instead of LEFT JOIN since you need to get all the way to attr_f and attr_n.

Moving the Joins for the Group_concats into the SELECT may help:

SELECT  a.article_id, 
        (
        SELECT  GROUP_CONCAT(ca.attr_de)
            FROM  articles_attr AS aa
            JOIN  cat_attr AS ca USING(attr_id)
            WHERE  aa.attr_group_id = 26
              AND  aa.article_id = a.article_id
        ) AS functions, 
        (
        SELECT  GROUP_CONCAT(attr_f.attr_de)
            FROM  ..
            JOIN  ..
            WHERE  .. 
        ) AS miscellaneous
    FROM  `articles_test` a
    WHERE  a.article_id = 11

But perhaps the most important is to avoid making the already-bad EAV schema design worse by normalizing the attributes!. That is, get rid of the table cat_attr, and move attr_de into articles_attr. This will cut in half the number of JOINs.

Upvotes: 0

trincot
trincot

Reputation: 350252

Since your WHERE clause specifies the value of article_id, there is no real need to let the select clause return it. It is better to remove it, also because it does not comply with SQL standards that say that if you have an aggregation (group_concat) all non-aggregating expressions in the select clause must be in the group by clause. But doing that (as in your question's first version) would give some overhead. So better remove it.

As the WHERE condition is on the primary key and you don't need any data from the articles_test table, you could omit the articles_test table, and put the WHERE condition on the foreign keys instead.

Finally, there is a kind of Cartesian join, as you combine every hit in attr_f with every hit in attr_n. This could lead to some duplicates in the group_concat outputs and represents a performance hit.

If it is OK to have such duplicates removed, then maybe you would have a better performance by splitting the query into to groups: one for the function output, one for the miscellaneous output. The group is then formed by attr_group_id.

This will also allow for turning the outer joins into inner joins.

So the output would be the unpivoted version of what you were after:

SELECT     attr.attr_group_id, GROUP_CONCAT(cat.attr_de) AS functions
FROM       articles_attr AS attr 
INNER JOIN cat_attr AS cat ON cat.attr_id = attr.attr_id
WHERE      attr.article_id = 11
       AND attr.attr_group_id IN (26, 27) 
GROUP BY   attr.attr_group_id

So now the output will have two rows. The one with 26 in the first column will list the functions in the second column, and the one with 27 in the first column will list the miscellaneous.

True, the output format is different, but I think you will be able to rework the code that uses this query, while benefiting from the performance increase (which I expect).

If you need the pivoted version, then use a case when expression:

SELECT     GROUP_CONCAT(CASE attr.attr_group_id WHEN 26 THEN cat.attr_de END) AS functions,
           GROUP_CONCAT(CASE attr.attr_group_id WHEN 27 THEN cat.attr_de END) AS miscellaneous
FROM       articles_attr AS attr 
INNER JOIN cat_attr AS cat ON cat.attr_id = attr.attr_id
WHERE      attr.article_id = 11
       AND attr.attr_group_id IN (26, 27) 

Upvotes: 1

O. Jones
O. Jones

Reputation: 108651

First of all, 9ms for a query like this isn't bad. There's no radical improvement to be had. You may be able to squeeze another millisecond or two from the query, and you may not.

Your three-column index on articles_attr looks OK. You might try switching the order of the first two columns in the index to see if you get better performance.

As it is, your single-column index on that table is unnecessary: that indexing functionality is provided because the same column comes first in your three-column index. Dropping that index probably won't help your query performance, but it will help insert performance.

GROUP_CONCAT() makes sense here. It's perfectly valid to aggregate an entire result set. You might add GROUP BY a.article_id just for clarity; it won't make any performance difference because you already choose only a single value of that column.

On cat_attr, a compound index on (attr_id, attr_de) may help. But that's apparently a small table, so it won't help too much.

Do you need the LEFT JOIN operations to join articles_attr to cat_attr? Or, by the structure of your data, is every value of articles_attr.attr_id guaranteed to find a match in cat_attr.attr_id. If you can change those LEFT JOIN operations to JOINs you may get a slight speedup.

Upvotes: 0

Related Questions