Reputation: 546
We have converted all the GTFS architecture to Maria DB tables.
https://developers.google.com/transit/gtfs/examples/gtfs-feed
So we have tables like - stop - trip - stop_time - etc
Then we have an SQL query to find all the stops after the current stop so we use the following query
SELECT DISTINCT t2.stop_id
FROM (SELECT stop_id,
trip_id,
stop_sequence
FROM stop_time
WHERE stop_id IN :stopIds) t1
inner join (SELECT stop_id,
trip_id,
stop_headsign,
stop_sequence
FROM stop_time
WHERE trip_id IN (SELECT trip_id
FROM stop_time
WHERE stop_id IN :stopIds)) t2
ON t2.trip_id = t1.trip_id
AND t2.stop_sequence > t1.stop_sequence;
However, when I run this query for each stop to populate it once in a different table to use the result set later, unfortunately the CPU usage goes to 100%
I am not sure why, thanks in Advance.
Upvotes: 0
Views: 1069
Reputation: 142298
IN ( SELECT ... )
, depending on the version of MySQL/MariaDB may optimize extremely poorly (ie, CPU). Try to turn it into a JOIN
.
AND t2.stop_sequence > t1.stop_sequence
smells like the worst way to do "groupwise-max". Is that what it is part of? It is O(N*N). There are faster ways. The fastest that I have found are here. Depending on your requirements, it can be O(N) or better.
FROM ( SELECT ... ) JOIN ( SELECT ... )
also may optimize terribly poorly -- neither 'derived table' has an index, so you there will be multiple table scans (ie, CPU). Let's see EXPLAIN SELECT ...
to see if it says All
on one of the derived tables. To fix, consider creating one of the subqueries as a TEMPORARY TABLE
and providing a suitable index.
And, as already mentioned, if you are using MariaDB 10.2, consider completely rewriting the entire query with Windowing and/or CTE techniques.
Upvotes: 1