Reputation: 12450
I'm looking to upgrade my MySQL database from 5.7.33 to 8.0.28. I use it with a Laravel app that generates queries using a whereHas
method that uses an exists subquery. However, I've noticed that performance has substantially reduced between the two versions - from ~5ms to 4s.
Here are the queries:
select * from `categories` where exists (select * from `jobs` inner join `category_job` on `jobs`.`id` = `category_job`.`job_id` where `categories`.`id` = `category_job`.`category_id`) order by `name` asc
What is interesting if that if I remove the exists subquery entirely, the query returns to a reasonable speed again - around the 5ms mark.
select * from `categories` order by `name` asc
What I'm trying to understand is why the performances changes so substantially between the two versions - if there is an error in my SQL that could be improved (or perhaps, changed in the framework to avoid the problem) - or this is simply a performance regression in MySQL that I'll have to live with.
I get that the first query is more work and going to be slower with the exists subquery, but I don't understand how simply upgrading MySQL has had thie effect.
When I run the query with EXPLAINS I get different results between 5.7 and 8.0 as well (note also that the databases have the same indexes):
5.7:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | categories | ALL | 27 | 100.00 | Using where; Using filesort | |||||
2 | DEPENDENT SUBQUERY | category_job | ref | category_job_category_id_index,category_job_job_id_index | category_job_category_id_index | 8 | jobsnearme.categories.id | 35253 | 100.00 | ||
2 | DEPENDENT SUBQUERY | jobs | eq_ref | PRIMARY | PRIMARY | 8 | jobsnearme.category_job.job_id | 1 | 100.00 | Using index |
8.0:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | categories | NULL | ALL | PRIMARY | NULL | NULL | NULL | 30 | 100.00 | Using filesort |
1 | SIMPLE | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 8 | jobsnearme.categories.id | 1 | 100.00 | NULL | |
2 | MATERIALIZED | jobs | NULL | index | PRIMARY | jobs_company_id_index | 9 | NULL | 718220 | 100.00 | Using index |
2 | MATERIALIZED | category_job | NULL | ref | category_job_category_id_index,category_job_job_id_index | category_job_job_id_index | 8 | jobsnearme.jobs.id | 1 | 100.00 | NULL |
Here is the SHOW CREATE TABLE category_job
from 8.0:
CREATE TABLE `category_job` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`category_id` bigint unsigned NOT NULL,
`job_id` bigint unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `category_job_category_id_index` (`category_id`),
KEY `category_job_job_id_index` (`job_id`),
CONSTRAINT `category_job_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`),
CONSTRAINT `category_job_job_id_foreign` FOREIGN KEY (`job_id`) REFERENCES `jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1070585 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Upvotes: 2
Views: 990
Reputation: 63
Try to set semijoin=off
in the optimizer_switch and check the EXPLAIN result
SET optimizer_switch = 'semijoin=off';
Upvotes: 2
Reputation: 142298
A common inefficiency in a many-to-many table schema.
This would be faster:
CREATE TABLE `category_job` (
`category_id` bigint unsigned NOT NULL,
`job_id` bigint unsigned NOT NULL,
PRIMARY KEY (category_id, job_id),
KEY (job_id, category_id)
) ENGINE=InnoDB
Upvotes: -2