Aleksandr Usarov
Aleksandr Usarov

Reputation: 91

How to optimize SQL query that uses GROUP BY and joined many-to-many relation tables?

I have tables with many-to-many relations:

CREATE TABLE `item` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(100) NOT NULL DEFAULT '',
  `size_id` tinyint(3) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  INDEX `size` (`size_id`)
);

CREATE TABLE `items_styles` (
  `style_id` smallint(5) unsigned NOT NULL,
  `item_id` mediumint(8) unsigned NOT NULL,
  PRIMARY KEY (`item_id`, `style_id`),
  INDEX `style` (`style_id`),
  INDEX `item` (`item_id`),
  CONSTRAINT `items_styles_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);

CREATE TABLE `items_themes` (
  `theme_id` tinyint(3) unsigned NOT NULL,
  `item_id` mediumint(8) unsigned NOT NULL,
  PRIMARY KEY (`item_id`, `theme_id`),
  INDEX `theme` (`theme_id`),
  INDEX `item` (`item_id`),
  CONSTRAINT `items_themes_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);

I'm trying to get the report that shows style_id and the number of items that use this style but with applying filters to the item table and/or to another table, like this:

SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
GROUP BY i_s.style_id;

-- or like this
SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
WHERE i.size_id != 3
GROUP BY i_s.style_id;

enter image description here

enter image description here

The problem is that tables are pretty big so queries take a long time to execute (~8 seconds)

item - 8M rows items_styles - 12M rows items_themes - 11M rows

Is there any way to optimize these queries? If not, what approach can be used to receive such reports. I will be grateful for any help. Thanks.

Upvotes: 2

Views: 185

Answers (2)

Rick James
Rick James

Reputation: 142298

In a many-to-many table, it is optimal to have these two indexes:

  PRIMARY KEY (`item_id`, `style_id`),
  INDEX `style` (`style_id`, `item_id`)

And be sure to use InnoDB.

More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Still, you have two many-to-many mappings, so there probably is no great solution.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269753

First, you don't need the items table for the queries. Probably doesn't have much impact on performance, but no need.

So you can write the query as:

SELECT i_s.style_id, COUNT(*) as total
FROM items_themes i_t JOIN
     items_styles i_s 
     ON i_s.item_id = i_t.item_id
WHERE i_t.theme_id IN (6, 7)
GROUP BY i_s.style_id;

For this query, you want an index on items_themes(theme_id, item_id). There is no much you can do about the GROUP BY.

Then, I don't think this is what you really want, because it will double count an item that has both themes. So, use EXISTS instead:

SELECT i_s.style_id, COUNT(*) as total
FROM items_styles i_s          
WHERE EXISTS (SELECT 
              FROM items_themes i_t
              WHERE i_t.item_id = i_s.item_id AND
                    i_t.theme_id IN (6, 7)
             )
GROUP BY i_s.style_id;

For this, you want an index on items_themes(item_id, theme_id). You can also try an index on items_styles(style_id). Some databases might be able to use that one, but I am guessing not MariaDB.

Upvotes: 3

Related Questions