Reputation: 6611
I have a fairly large table (~ 6 GB) and I have performance problems on this query:
SELECT f.*,
TIME_FORMAT(f.scheme, '%H:%i') as scheme,
TIME_FORMAT(f.actual, '%H:%i') as actual,
DATE_FORMAT(f.flight_date, '%d-%m-%Y') as flight_date_formatted,
a.iata
FROM flights_database f
LEFT JOIN airports a ON f.airport = a.airportNameClean
WHERE f.flight_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 30 DAY)
AND DATE_ADD(CURDATE(), INTERVAL 2 DAY)
AND (f.flight_number LIKE 'New York%' OR f.airport LIKE 'New York%' OR f.airline LIKE 'New York%')
ORDER by f.flight_date DESC, f.flight_scheme DESC
LIMIT 50"
I've used EXPLAIN
and identified these underlying problems
I have an index on flight_date, flight_number, airport, airline, scheme
and it reports to use it.
But this query can still take ~30 seconds which off course is too much.
What probably would help is using some kind of subquery to replace the OR part. But how can I determine what type of search query (e.g. which column) I actually need to search on after running the subquery.
Ideas and tips appreciated.
Upvotes: 2
Views: 2164
Reputation: 928
I believe your current index isn't optimal for the query mainly because of the 'or' expression. You should create 3 indexes.
(flight_number, flight_date, schema)
(airport, flight_date, schema)
(airline, flight_date, schema)
Then change the query to use the three indexes. You could also play with it a bit and maybe prune each sub query by adding an order by and limit to 50 as well.
select flight.*,
TIME_FORMAT(flight.scheme, '%H:%i') as scheme,
TIME_FORMAT(flight.actual, '%H:%i') as actual,
DATE_FORMAT(flight.flight_date, '%d-%m-%Y') as flight_date_formatted,
a.iata
from (
select *
from (
select f.Id,
f.flight_date,
f.schema
from flights_database f
where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
and f.flight_number like 'New York%'
order by f.flight_date desc,
f.schema desc limit 50
union
select f.Id,
f.flight_date,
f.schema
from flights_database f
where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
and f.airline like 'New York%'
order by f.flight_date desc,
f.schema desc limit 50
union
select f.Id,
f.flight_date,
f.schema
from flights_database f
where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
and f.airport like 'New York%'
order by f.flight_date desc,
f.schema desc limit 50
) f1
order by f1.flight_date desc,
f.schema desc limit 50
) f2
inner join flights_database flight on f2.Id = flight.Id
left join airports a on flight.airport = a.airportNameClean;
Currently your or statement will expand to: [flight_date, flight_number], [flight_date, airline], [flight_date, airport]
So when the optimizer looks at your index it will match [flight_date, flight_number] to your current index [flight_date, flight_number, airport, airline, scheme] (notice how they start off the same), but when it encounters [flight_date, airline] there isn't an index to match this expression. So the optimizer would determine it would need to do a index scan or a table scan. Then it would encounter [flight_date, airport] again it will determine this to require a index scan or a table scan.
With the three new indexes and the new query it would match the three indexes to the three criteria and determine each would require an index seek (hopefully). Then we include 'scheme' to save the row lookup by id for all the rows matching the criteria.
Upvotes: 1