Jeff
Jeff

Reputation: 6663

MySQL WHERE EXISTS subquery performance between 5.7 and 8.0

I have an animals table (~2.7m records) and a breeds table (~2.7m records) that have a one to many relationship (one animal can have multiple breeds). I'm trying to query all distinct breeds for a specific species. As I'm not a SQL expert, my initial thought was to go with a simple SELECT DISTINCT breed ... JOIN, but this query took about 10 seconds which seemed much longer than I'd expect. So I changed this to a SELECT DISTINCT ... WHERE EXISTS subquery and it executed in about 100ms in 5.7, which is much more reasonable. But now I'm migrating to MySQL 8 and this exact query takes anywhere from 10-30 seconds. Here are the table definitions:

CREATE TABLE `animals` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `species` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `sex` enum('Male','Female') CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `dob` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `animals_name_index` (`name`),
  KEY `animals_dob_index` (`dob`),
  KEY `animals_sex_index` (`sex`),
  KEY `animals_species_index` (`species`,`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2807152 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci
CREATE TABLE `animal_breeds` (
  `animal_id` int unsigned DEFAULT NULL,
  `breed` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  UNIQUE KEY `animal_breeds_animal_id_breed_unique` (`animal_id`,`breed`),
  KEY `animal_breeds_breed_animal_id_index` (`breed`,`animal_id`) USING BTREE,
  CONSTRAINT `animal_breeds_animal_id_foreign` FOREIGN KEY (`animal_id`) REFERENCES `animals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Here is the query I'm running:

SELECT SQL_NO_CACHE * 
FROM
    (
        SELECT DISTINCT `breed` 
        FROM `animal_breeds`
    ) AS `subQuery`
WHERE
    EXISTS (
        SELECT `breed`
        FROM `animal_breeds`
        INNER JOIN `animals` ON `animals`.`id` = `animal_breeds`.`animal_id`
        WHERE `animals`.`species` = 'Dog'AND `animal_breeds`.`breed` = `subQuery`.`breed`
    );

Here are the two EXPLAIN statements from 5.7 and 8.0

MySQL 5.7

284 rows in set, 1 warning (0.02 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 7775 100.00 Using where
3 DEPENDENT SUBQUERY animal_breeds NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 allBreeds.breed 348 100.00 Using where; Using index
3 DEPENDENT SUBQUERY animals NULL eq_ref PRIMARY,animals_species_index PRIMARY 4 animal_breeds.animal_id 1 50.00 Using where
2 DERIVED animal_breeds NULL range animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 NULL 7775 100.00 Using index for group-by

MySQL 8.0.27

284 rows in set, 1 warning (27.92 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 7776 100.00 NULL
1 PRIMARY <subquery3> NULL eq_ref <auto_distinct_key> <auto_distinct_key> 1022 allBreeds.breed 1 100.00 NULL
3 MATERIALIZED animals NULL ref PRIMARY,animals_species_index animals_species_index 153 const 1390666 100.00 Using index
3 MATERIALIZED animal_breeds NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 animals.id 1 100.00 Using index
2 DERIVED animal_breeds NULL range animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 NULL 7776 100.00 Using index for group-by

Lastly, both of these databases are using the base docker image with no changes to the configuration. Although the query still runs poorly on an VPS running MySQL 8 with some tweaked settings. I also read through a thread about someone having a similar problem but the comments/answer didn't seem to help in my case.

Any help would be much appreciated!

EDIT:

Here is the execution plan for the SELECT DISTINCT ... JOIN:

SELECT DISTINCT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'

MySQL 5.7

284 rows in set (25.27 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1385271 100.00 Using index; Using temporary
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index

MySQL 8.0

284 rows in set (29.45 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1390666 100.00 Using index; Using temporary
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index
SELECT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'

MySQL 5.7

2722722 rows in set (26.69 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1385271 100.00 Using index
1 SIMPLE ab ref animal_breeds_animal_id_breed_unique animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index

MySQL 8.0

2722722 rows in set (32.49 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1390666 100.00 Using index
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index

Upvotes: 4

Views: 1278

Answers (3)

Italo Borssatto
Italo Borssatto

Reputation: 15679

Filtering animals before joining it to breeds will improve performance (10x faster in some cases):

SELECT  DISTINCT ab.breed
FROM    animal_breeds ab
WHERE   ab.animal_id IN (
                     SELECT a.id
                     FROM animals a
                     WHERE a.species = 'Dog');

Upvotes: 2

Rick James
Rick James

Reputation: 142198

Consider this:

  • Change the table name animals to pets. (I think of "Fido" as a pet and "dog" as an animal. Because of this, it took me a long time to figure out the schema.)
  • Move species to the other table, now calledpet_species. Though I cannot imagine a "cat" breed being called "retriever", it does make sense that (species, breed) is a hierarchical pair of terms.
  • Another confusion: Technically speaking a "dog" is "Canis familiaris", which is [technically] two terms 'genus' and 'species'.

Moving species to the other table will lead to some changes to the queries. You could have it in both tables, though DB purists say that redundant information is a "no-no". I have not thought of a compromise between the two stands.

Upvotes: 0

abler98
abler98

Reputation: 553

Try to write query without inner join and start with a table that contains columns from where conditions. Here is the one of possible variants:

SELECT DISTINCT ab.breed
FROM animals a
LEFT JOIN animal_breeds ab on a.id = ab.animal_id
WHERE a.species = 'Dog'

Upvotes: 0

Related Questions