Tum
Tum

Reputation: 7585

Subquery processing more rows than necessary

I am optimising my queries and found something I can't get my head around.

I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:

SELECT `c`.`id` AS `category.id`, 
    (SELECT `alias`
    FROM `aliases`
    WHERE category_id = c.id
    AND `old` = 0
    AND `lang_id` = 1
    ORDER BY `id` DESC
    LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` =  1 AND `c`.`parent_id` =  '11';

There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.

enter image description here

Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.

enter image description here

Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.

So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:

enter image description here

The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!

Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!

Create table schema

CREATE TABLE `aliases` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `lang_id` int(2) unsigned NOT NULL DEFAULT '1',
    `alias` varchar(255) DEFAULT NULL,
    `product_id` int(10) unsigned DEFAULT NULL,
    `category_id` int(10) unsigned DEFAULT NULL,
    `brand_id` int(10) unsigned DEFAULT NULL,
    `page_id` int(10) unsigned DEFAULT NULL,
    `campaign_id` int(10) unsigned DEFAULT NULL,
    `old` tinyint(1) unsigned DEFAULT '0',
    PRIMARY KEY (`id`),
    KEY `product_id` (`product_id`),
    KEY `category_id` (`category_id`),
    KEY `page_id` (`page_id`),
    KEY `alias_product_id` (`product_id`,`alias`),
    KEY `alias_category_id` (`category_id`,`alias`),
    KEY `alias_page_id` (`page_id`,`alias`),
    KEY `alias_brand_id` (`brand_id`,`alias`),
    KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
    KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
    KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
    KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
    KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
    KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

Upvotes: 0

Views: 52

Answers (1)

Rick James
Rick James

Reputation: 142453

SELECT ...
    WHERE x=1 AND y=2
    ORDER BY id DESC
    LIMIT 1

will be performed in one of several ways.

Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...

  • INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
  • Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
  • Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.

Some of the EXPLAIN clues:

  • Using where -- does not say much
  • Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
  • Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.

Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

With the CREATE TABLE, I may have suggestions on how to improve the index.

Upvotes: 1

Related Questions