servabat
servabat

Reputation: 376

How to choose indexes

I'm trying to execute several times the following query :

SELECT st2.stop_id AS to_stop_id,
       TIME_TO_SEC(
       ADDTIME(TIMEDIFF(MIN(st1.time), %time),
       TIMEDIFF(st2.time, st2.time))) AS duration
FROM   stop_times st1,
       stop_times st2,
       trips tr,
       calendar cal
WHERE  tr.service_id   = cal.service_id
  AND  tr.trip_id      = st1.trip_id
  AND  st1.trip_id     = st2.trip_id
  AND  st1.stop_id     = %sid
  AND  st1.stop_seq +1 = st2.stop_seq
  AND  st1.time        > %time
  AND  DATE(NOW()) BETWEEN cal.start_date AND
  cal.end_date
GROUP BY st2.stop_id

However, it run extremely slow. I indexed the following attributes:

+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| stop_times |          0 | st_id      |            1 | st_id       | A         |    11431583 |     NULL | NULL   |      | BTREE      |         |               |
| stop_times |          1 | fk_tid_s   |            1 | trip_id     | A         |     1039234 |     NULL | NULL   | YES  | BTREE      |         |               |
| stop_times |          1 | st_per_sid |            1 | stop_id     | A         |       33135 |     NULL | NULL   | YES  | BTREE      |         |               |
| calendar   |          0 | PRIMARY    |            1 | service_id  | A         |        5206 |     NULL | NULL   |      | BTREE      |         |               |
| calendar   |          0 | PRIMARY    |            1 | service_id  | A         |        5206 |     NULL | NULL   |      | BTREE      |         |               |
| trips      |          0 | PRIMARY    |            1 | trip_id     | A         |      449489 |     NULL | NULL   |      | BTREE      |         |               |
| trips      |          1 | fk_rid     |            1 | route_id    | A         |        1937 |     NULL | NULL   | YES  | BTREE      |         |               |
| trips      |          1 | fk_sid     |            1 | service_id  | A         |        7749 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

(For some reasons, st_id is not show as a PRIMARY KEY, but it is, I don't know if it's important but just in case..)

I ran SQL EXPLAIN on this query and it gave me the following answer :

+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+
| id   | select_type | table | type   | possible_keys                                   | key                 | key_len | ref                          | rows | Extra                                                               |
+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+
|    1 | SIMPLE      | st1   | range  | comp_uniq_st_seq,st_per_sid,comp_uniq_stid_time | comp_uniq_stid_time | 9       | NULL                         | 1396 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | tr    | eq_ref | PRIMARY,fk_sid                                  | PRIMARY             | 8       | reseau_ratp.st1.trip_id      |    1 | Using where                                                         |
|    1 | SIMPLE      | cal   | eq_ref | PRIMARY,comp_sid_date_en,comp_sid_date_st       | PRIMARY             | 4       | reseau_ratp.tr.service_id    |    1 | Using where                                                         |
|    1 | SIMPLE      | st2   | ref    | comp_uniq_st_seq                                | comp_uniq_st_seq    | 14      | reseau_ratp.st1.trip_id,func |    1 | Using index condition                                               |
+------+-------------+-------+--------+-------------------------------------------------+---------------------+---------+------------------------------+------+---------------------------------------------------------------------+

What should I do to get this query running faster?

EDIT : Query using the requested syntax :

SELECT st2.stop_id AS to_stop_id,
       TIME_TO_SEC(
       ADDTIME(TIMEDIFF(MIN(st1.time), %time),
       TIMEDIFF(st2.time, st2.time))) AS duration

FROM   stop_times st1
  INNER JOIN stop_times st2
          ON st1.trip_id = st2.trip_id AND st1.stop_seq + 1 = st2.stop_seq
  INNER JOIN trips tr
          ON tr.trip_id = st1.trip_id
  INNER JOIN calendar cal
          ON tr.service_id = cal.service_id

WHERE  st1.stop_id     =  %sid
  AND  st1.time        >  %time
  AND  cal.start_date  <= NOW()
  AND  cal.end_date    >= NOW()

GROUP BY st2.stop_id

Here SHOW CREATE TABLE stop_times:

CREATE TABLE `stop_times` (
  `trip_id` bigint(10) unsigned DEFAULT NULL,
  `stop_id` int(10) DEFAULT NULL,
  `time` time DEFAULT NULL,
  `stop_seq` int(10) unsigned DEFAULT NULL,
  UNIQUE KEY `comp_uniq_st_seq` (`trip_id`,`stop_seq`),
  KEY `comp_uniq_stid_time` (`stop_id`,`time`),
  CONSTRAINT `fk_sid_s` FOREIGN KEY (`stop_id`) REFERENCES `stops` (`stop_id`),
  CONSTRAINT `fk_tid_s` FOREIGN KEY (`trip_id`) REFERENCES `trips` (`trip_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

For calendar :

CREATE TABLE `calendar` (
  `service_id` int(10) unsigned NOT NULL,
  `start_date` date DEFAULT NULL,
  `end_date` date DEFAULT NULL,
  PRIMARY KEY (`service_id`),
  KEY `comp_sid_date_en` (`service_id`,`end_date`),
  KEY `comp_sid_date_st` (`service_id`,`start_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

And for trips :

CREATE TABLE `trips` (
  `trip_id` bigint(10) unsigned NOT NULL DEFAULT '0',
  `route_id` int(10) unsigned DEFAULT NULL,
  `service_id` int(10) unsigned DEFAULT NULL,
  `trip_headsign` varchar(15) DEFAULT NULL,
  `trip_short_name` varchar(15) DEFAULT NULL,
  `direction_id` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`trip_id`),
  KEY `fk_rid` (`route_id`),
  KEY `fk_sid` (`service_id`),
  CONSTRAINT `fk_rid` FOREIGN KEY (`route_id`) REFERENCES `routes` (`route_id`),
  CONSTRAINT `fk_sid` FOREIGN KEY (`service_id`) REFERENCES `calendar` (`service_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Upvotes: 0

Views: 73

Answers (1)

Rick James
Rick James

Reputation: 142296

st1 needs this composite index: INDEX(stop_id, time)

Please use the JOIN ... ON syntax.

Please provide SHOW CREATE TABLE.

Here is a Cookbook on creating INDEXes from a SELECT.

(Edit)

Calendar is trickier to handle, and there is no "good" index. These may help:

INDEX(service_id, start_time)
INDEX(service_id, end_time)

plus, reformulate AND DATE(NOW()) BETWEEN cal.start_date AND cal.end_date into

AND cal.start_date <= NOW()
AND cal.end_time   >= NOW()

(Edit 2)

Wherever practical, say NOT NULL. This is probably especially important in stop_times which does not have a PRIMARY KEY. Change the two columns in UNIQUE KEY comp_uniq_st_seq (trip_id,stop_seq) to be NOT NULL and turn it into PRIMARY KEY (trip_id, stop_seq). This will allow the performance benefits of "the PK is clustered with the data" to kick in.

Now that I see the CREATE TABLE for Calendar, and that service_id is the PRIMARY KEY, the two indexes I suggested for it are probably useless. (Again, this relates to "clustering".)

My Cookbook for building indexes may come in handy.

Upvotes: 1

Related Questions