btse
btse

Reputation: 8021

MySQL query performance slow

I am still new to SQL and I am trying to improve the performance of my query. I have been searching around and have come to the conclusion that using JOINS instead of so many WHERE INS would help improve my performance, but I am unsure of how I would convert my statement. This is my current statement.

SELECT stop_id, stop_name FROM stops WHERE stop_id IN (
       SELECT DISTINCT stop_id FROM stop_times WHERE trip_id IN (
              SELECT trip_id from trips WHERE route_id = <routeid> ));

It takes anywhere from 5-25 seconds to return the results which is unacceptable. I was hoping to get it below 1 second. If anyone was wondering the data is from a GTFS feed. The stops and trips tables have about ~10,000 rows each, while the stop_times table has ~900,000. I have created indexes at each of the columns I am using. Here is the output of EXPLAIN, and also what was used to create each table.

Thanks for any help and if you need any more info let me know!

+----+--------------------+------------+-----------------+------------------+---------+---------+------+------+-------------+
| id | select_type        | table      | type            | possible_keys    | key     | key_len | ref  | rows | Extra       |
+----+--------------------+------------+-----------------+------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | stops      | ALL             | NULL             | NULL    | NULL    | NULL | 6481 | Using where |
|  2 | DEPENDENT SUBQUERY | stop_times | index_subquery  | stop_id          | stop_id | 63      | func |   63 | Using where |
|  3 | DEPENDENT SUBQUERY | trips      | unique_subquery | PRIMARY,route_id | PRIMARY | 62      | func |    1 | Using where |
+----+--------------------+------------+-----------------+------------------+---------+---------+------+------+-------------+


| stops | CREATE TABLE `stops` (
  `stop_id` varchar(20) NOT NULL,
  `stop_code` varchar(50) DEFAULT NULL,
  `stop_name` varchar(255) DEFAULT NULL,
  `stop_desc` varchar(255) DEFAULT NULL,
  `stop_lat` decimal(8,6) DEFAULT NULL,
  `stop_lon` decimal(8,6) DEFAULT NULL,
  `zone_id` int(11) DEFAULT NULL,
  `stop_url` varchar(255) DEFAULT NULL,
  `location_type` int(2) DEFAULT NULL,
  `parent_station` int(11) DEFAULT NULL,
  `wheelchair_boarding` int(2) DEFAULT NULL,
  PRIMARY KEY (`stop_id`),
  KEY `zone_id` (`zone_id`),
  KEY `stop_lat` (`stop_lat`),
  KEY `stop_lon` (`stop_lon`),
  KEY `location_type` (`location_type`),
  KEY `parent_station` (`parent_station`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |


| stop_times | CREATE TABLE `stop_times` (
  `trip_id` varchar(20) DEFAULT NULL,
  `arrival_time` varchar(8) DEFAULT NULL,
  `arrival_time_seconds` int(11) DEFAULT NULL,
  `departure_time` varchar(8) DEFAULT NULL,
  `departure_time_seconds` int(11) DEFAULT NULL,
  `stop_id` varchar(20) DEFAULT NULL,
  `stop_sequence` int(11) DEFAULT NULL,
  `stop_headsign` varchar(50) DEFAULT NULL,
  `pickup_type` int(2) DEFAULT NULL,
  `drop_off_type` int(2) DEFAULT NULL,
  `shape_dist_traveled` varchar(50) DEFAULT NULL,
  KEY `trip_id` (`trip_id`),
  KEY `arrival_time_seconds` (`arrival_time_seconds`),
  KEY `departure_time_seconds` (`departure_time_seconds`),
  KEY `stop_id` (`stop_id`),
  KEY `stop_sequence` (`stop_sequence`),
  KEY `pickup_type` (`pickup_type`),
  KEY `drop_off_type` (`drop_off_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

| trips | CREATE TABLE `trips` (
  `route_id` varchar(20) DEFAULT NULL,
  `service_id` varchar(20) DEFAULT NULL,
  `trip_id` varchar(20) NOT NULL,
  `trip_headsign` varchar(255) DEFAULT NULL,
  `trip_short_name` varchar(255) DEFAULT NULL,
  `direction_id` tinyint(1) DEFAULT NULL,
  `block_id` int(11) DEFAULT NULL,
  `shape_id` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`trip_id`),
  KEY `route_id` (`route_id`),
  KEY `service_id` (`service_id`),
  KEY `direction_id` (`direction_id`),
  KEY `block_id` (`block_id`),
  KEY `shape_id` (`shape_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

Upvotes: 0

Views: 502

Answers (1)

Andrew G
Andrew G

Reputation: 349

You're right in thinking that JOINS are usually faster than WHERE IN subqueries.

Try this:

SELECT T3.stop_id, T3.stop_name 
FROM trips AS T1
JOIN
stop_times AS T2
ON T1.trip_id=T2.trip_id AND route_id = <routeid>
JOIN stops AS T3
ON T2.stop_id=T3.stop_id
GROUP BY T3.stop_id, T3.stop_name

Upvotes: 3

Related Questions