Reputation: 6663
I have an animals
table (~2.7m records) and a breeds
table (~2.7m records) that have a one to many relationship (one animal can have multiple breeds). I'm trying to query all distinct breeds for a specific species. As I'm not a SQL expert, my initial thought was to go with a simple SELECT DISTINCT breed ... JOIN
, but this query took about 10 seconds which seemed much longer than I'd expect. So I changed this to a SELECT DISTINCT ... WHERE EXISTS
subquery and it executed in about 100ms in 5.7, which is much more reasonable. But now I'm migrating to MySQL 8 and this exact query takes anywhere from 10-30 seconds. Here are the table definitions:
CREATE TABLE `animals` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`species` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sex` enum('Male','Female') CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`dob` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `animals_name_index` (`name`),
KEY `animals_dob_index` (`dob`),
KEY `animals_sex_index` (`sex`),
KEY `animals_species_index` (`species`,`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2807152 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci
CREATE TABLE `animal_breeds` (
`animal_id` int unsigned DEFAULT NULL,
`breed` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
UNIQUE KEY `animal_breeds_animal_id_breed_unique` (`animal_id`,`breed`),
KEY `animal_breeds_breed_animal_id_index` (`breed`,`animal_id`) USING BTREE,
CONSTRAINT `animal_breeds_animal_id_foreign` FOREIGN KEY (`animal_id`) REFERENCES `animals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Here is the query I'm running:
SELECT SQL_NO_CACHE *
FROM
(
SELECT DISTINCT `breed`
FROM `animal_breeds`
) AS `subQuery`
WHERE
EXISTS (
SELECT `breed`
FROM `animal_breeds`
INNER JOIN `animals` ON `animals`.`id` = `animal_breeds`.`animal_id`
WHERE `animals`.`species` = 'Dog'AND `animal_breeds`.`breed` = `subQuery`.`breed`
);
Here are the two EXPLAIN
statements from 5.7 and 8.0
MySQL 5.7
284 rows in set, 1 warning (0.02 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 7775 | 100.00 | Using where |
3 | DEPENDENT SUBQUERY | animal_breeds | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | allBreeds.breed | 348 | 100.00 | Using where; Using index |
3 | DEPENDENT SUBQUERY | animals | NULL | eq_ref | PRIMARY,animals_species_index | PRIMARY | 4 | animal_breeds.animal_id | 1 | 50.00 | Using where |
2 | DERIVED | animal_breeds | NULL | range | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | NULL | 7775 | 100.00 | Using index for group-by |
MySQL 8.0.27
284 rows in set, 1 warning (27.92 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 7776 | 100.00 | NULL |
1 | PRIMARY | <subquery3> | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 1022 | allBreeds.breed | 1 | 100.00 | NULL |
3 | MATERIALIZED | animals | NULL | ref | PRIMARY,animals_species_index | animals_species_index | 153 | const | 1390666 | 100.00 | Using index |
3 | MATERIALIZED | animal_breeds | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | animals.id | 1 | 100.00 | Using index |
2 | DERIVED | animal_breeds | NULL | range | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | NULL | 7776 | 100.00 | Using index for group-by |
Lastly, both of these databases are using the base docker image with no changes to the configuration. Although the query still runs poorly on an VPS running MySQL 8 with some tweaked settings. I also read through a thread about someone having a similar problem but the comments/answer didn't seem to help in my case.
Any help would be much appreciated!
EDIT:
Here is the execution plan for the SELECT DISTINCT ... JOIN
:
SELECT DISTINCT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'
MySQL 5.7
284 rows in set (25.27 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1385271 | 100.00 | Using index; Using temporary |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
MySQL 8.0
284 rows in set (29.45 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1390666 | 100.00 | Using index; Using temporary |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
SELECT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'
MySQL 5.7
2722722 rows in set (26.69 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1385271 | 100.00 | Using index | |
1 | SIMPLE | ab | ref | animal_breeds_animal_id_breed_unique | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
MySQL 8.0
2722722 rows in set (32.49 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1390666 | 100.00 | Using index |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
Upvotes: 4
Views: 1278
Reputation: 15679
Filtering animals
before joining it to breeds
will improve performance (10x faster in some cases):
SELECT DISTINCT ab.breed
FROM animal_breeds ab
WHERE ab.animal_id IN (
SELECT a.id
FROM animals a
WHERE a.species = 'Dog');
Upvotes: 2
Reputation: 142198
Consider this:
animals
to pets
. (I think of "Fido" as a pet and "dog" as an animal. Because of this, it took me a long time to figure out the schema.)species
to the other table, now calledpet_species
. Though I cannot imagine a "cat" breed being called "retriever", it does make sense that (species, breed) is a hierarchical pair of terms.Moving species to the other table will lead to some changes to the queries. You could have it in both tables, though DB purists say that redundant information is a "no-no". I have not thought of a compromise between the two stands.
Upvotes: 0
Reputation: 553
Try to write query without inner join and start with a table that contains columns from where conditions. Here is the one of possible variants:
SELECT DISTINCT ab.breed
FROM animals a
LEFT JOIN animal_breeds ab on a.id = ab.animal_id
WHERE a.species = 'Dog'
Upvotes: 0