Dwight
Dwight

Reputation: 12450

MySQL exists subquery performs significantly worse in 8.0 than 5.7

I'm looking to upgrade my MySQL database from 5.7.33 to 8.0.28. I use it with a Laravel app that generates queries using a whereHas method that uses an exists subquery. However, I've noticed that performance has substantially reduced between the two versions - from ~5ms to 4s.

Here are the queries:

select * from `categories` where exists (select * from `jobs` inner join `category_job` on `jobs`.`id` = `category_job`.`job_id` where `categories`.`id` = `category_job`.`category_id`) order by `name` asc

What is interesting if that if I remove the exists subquery entirely, the query returns to a reasonable speed again - around the 5ms mark.

select * from `categories` order by `name` asc

What I'm trying to understand is why the performances changes so substantially between the two versions - if there is an error in my SQL that could be improved (or perhaps, changed in the framework to avoid the problem) - or this is simply a performance regression in MySQL that I'll have to live with.

I get that the first query is more work and going to be slower with the exists subquery, but I don't understand how simply upgrading MySQL has had thie effect.


When I run the query with EXPLAINS I get different results between 5.7 and 8.0 as well (note also that the databases have the same indexes):

5.7:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY categories ALL 27 100.00 Using where; Using filesort
2 DEPENDENT SUBQUERY category_job ref category_job_category_id_index,category_job_job_id_index category_job_category_id_index 8 jobsnearme.categories.id 35253 100.00
2 DEPENDENT SUBQUERY jobs eq_ref PRIMARY PRIMARY 8 jobsnearme.category_job.job_id 1 100.00 Using index

8.0:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE categories NULL ALL PRIMARY NULL NULL NULL 30 100.00 Using filesort
1 SIMPLE NULL eq_ref <auto_distinct_key> <auto_distinct_key> 8 jobsnearme.categories.id 1 100.00 NULL
2 MATERIALIZED jobs NULL index PRIMARY jobs_company_id_index 9 NULL 718220 100.00 Using index
2 MATERIALIZED category_job NULL ref category_job_category_id_index,category_job_job_id_index category_job_job_id_index 8 jobsnearme.jobs.id 1 100.00 NULL

Here is the SHOW CREATE TABLE category_job from 8.0:

CREATE TABLE `category_job` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `category_id` bigint unsigned NOT NULL,
  `job_id` bigint unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `category_job_category_id_index` (`category_id`),
  KEY `category_job_job_id_index` (`job_id`),
  CONSTRAINT `category_job_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`),
  CONSTRAINT `category_job_job_id_foreign` FOREIGN KEY (`job_id`) REFERENCES `jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1070585 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Upvotes: 2

Views: 990

Answers (2)

Arie S.
Arie S.

Reputation: 63

Try to set semijoin=off in the optimizer_switch and check the EXPLAIN result

SET optimizer_switch = 'semijoin=off';

Upvotes: 2

Rick James
Rick James

Reputation: 142298

A common inefficiency in a many-to-many table schema.

This would be faster:

CREATE TABLE `category_job` (
  `category_id` bigint unsigned NOT NULL,
  `job_id` bigint unsigned NOT NULL,
  PRIMARY KEY (category_id, job_id),
  KEY (job_id, category_id)
) ENGINE=InnoDB

Upvotes: -2

Related Questions