Reputation: 21
I have a table named 'response_set' with following indexes (result of 'show create table response_set;'):
| response_set | CREATE TABLE `response_set` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`survey_id` int(11) NOT NULL DEFAULT '0',
`respondent_id` int(11) DEFAULT NULL,
`ext_ref` varchar(64) DEFAULT NULL,
`email_addr` varchar(128) DEFAULT NULL,
`ip` varchar(32) DEFAULT NULL,
`t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`time_taken` int(11) DEFAULT NULL,
`category_id` int(11) DEFAULT NULL,
`duplicate` int(1) DEFAULT '0',
`email_group` varchar(30) DEFAULT NULL,
`external_email_id` int(11) DEFAULT NULL,
`geo_code_country` varchar(64) DEFAULT NULL,
`geo_code_country_code` varchar(2) DEFAULT NULL,
`terminated_survey` int(1) DEFAULT NULL,
`geo_code_region` varchar(128) DEFAULT NULL,
`geo_code_city` varchar(3) DEFAULT NULL,
`geo_code_area_code` varchar(3) DEFAULT NULL,
`geo_code_dma_code` varchar(3) DEFAULT NULL,
`restart_url` varchar(255) DEFAULT NULL,
`inset_list` varchar(1024) DEFAULT NULL,
`custom1` varchar(1024) DEFAULT NULL,
`custom2` varchar(1024) DEFAULT NULL,
`custom3` varchar(1024) DEFAULT NULL,
`custom4` varchar(1024) DEFAULT NULL,
`panel_member_id` int(11) DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`weight` float DEFAULT NULL,
`custom5` varchar(1024) DEFAULT NULL,
`quota_overlimit` int(1) DEFAULT '0',
`panel_id` int(11) DEFAULT NULL,
`referer_url` varchar(255) DEFAULT NULL,
`referer_domain` varchar(64) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`longitude` decimal(15,12) DEFAULT '0.000000000000',
`latitude` decimal(15,12) DEFAULT '0.000000000000',
`radius` decimal(7,2) DEFAULT '0.00',
`cx_business_unit_id` int(11) DEFAULT '0',
`survey_link_id` int(11) DEFAULT '0',
`data_quality_flag` int(1) DEFAULT '0',
`data_quality_score` double DEFAULT '0',
`extended_info_json` json DEFAULT NULL,
`updated_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`channel` int(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `panel_member_id` (`panel_member_id`),
KEY `panel_member_id_2` (`panel_member_id`),
KEY `email_group` (`email_group`),
KEY `email_group_2` (`email_group`),
KEY `survey_timestamp_idx` (`survey_id`,`t`),
KEY `cx_business_unit_id_idx` (`cx_business_unit_id`),
KEY `data_quality_flag_idx` (`data_quality_flag`),
KEY `data_quality_score_idx` (`data_quality_score`),
KEY `survey_timestamp_terminated_idx` (`survey_id`,`t`,`terminated_survey`),
KEY `survey_idx` (`survey_id`)
) ENGINE=InnoDB AUTO_INCREMENT=39759 DEFAULT CHARSET=utf8 |
Now I am executing the following query on a page to retrieve the response_set rows based on survey_id and order by id:
SELECT *
FROM response_set a
WHERE a.survey_id = 1602673827
ORDER BY a.id limit 100;
The issue is sometimes the query is taking more than 30 seconds to be executed and this behaviour is inconsistent (as it sometimes happen when order by a.id and sometimes when order by a.id DESC as the user can view the response sets in ascending or descending order on the page) for different survey_id.
There are approx 6.2 million records in the table and for the given survey_id (1602673827) there are 45,800 records. On using the EXPLAIN SELECT statement to understand the query execution plan, I got the following info:
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | index | survey_timestamp_idx,survey_timestamp_terminated_idx | PRIMARY | 4 | NULL | 6863 | 1.46 | Using where |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Now I am not able to understand that even though the indexes -> 'survey_timestamp_idx,survey_timestamp_terminated_idx' are present, why is MySQL not using the indexes and is opting for the full table scan. Also when i modify the query as follows:
SELECT *
FROM response_set a USE INDEX (survey_timestamp_idx)
WHERE a.survey_id = 1602673827
ORDER BY a.id limit 100;
The query execution time is reduced to 0.17 seconds. On doing the EXPLAIN for the modified query, I get the following info:
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
| 1 | SIMPLE | a | NULL | ref | survey_timestamp_idx | survey_timestamp_idx | 4 | const | 87790 | 100.00 | Using index condition; Using filesort |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
However, I don't want to explicitly use 'USE INDEX' in the query as the where clause is dynamic and may contain following combinations in where clause as per the user's choice of filter:
1. where survey_id = ?;
2. where survey_id = ? and t = ?; (t is timestamp)
3. where survey_id = ? and terminated_survey = ?;
4. where survey_id = ? and t = ? and terminated_survey = ?;
Also, if I remove the ORDER BY clause from the query, the query always uses index and gets executed very fast.
Is there any other way, so that the MySQL query engine chooses the correct (faster) execution plan (by using correct indexes) when ORDER BY clause is present in query?
I am using MySQL version : 5.7.22
I have read the MySQL official documentation for ORDER BY query optimization (https://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html) and tried adding composite index on (id, survey_id) and (survey_id, id) but it didn't work. Can somebody please help?
Upvotes: 2
Views: 556
Reputation: 142540
Assuming you have ORDER BY id ASC (or DESC)
, then you need 4 indexes to handle all of them optimally. Start with the 1, 2, or 3 columns (in any order) mentioned in the WHERE
, then finish with id
.
I cannot explain why KEY survey_idx
(survey_id
) was not used for the query in question, nor was that index a "possible_key" in the EXPLAIN
. It is as if something changed between running the queries and posting this Question. Please recheck.
BTW, INT(1)
still takes 4 bytes; you probably wanted the one-byte TINYINT UNSIGNED
. Many of the other fields are bigger than necessary. Size plays into performance, at least a little.
0.17s -- Might be even faster with FORCE INDEX(survey_idx)
Starting with the PRIMARY KEY
(as in (id, survey_id)
) is almost always useless. An index should start things that are tested with =
, then move onto something tested as a range or a GROUP BY
or (as in your case), ORDER BY
.
Cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Upvotes: 1