Reputation:
I have a MySQL table called devicelog with it's PK on id
, but multiple indices on device_id
(INT), field_id
(INT), and unixtime
(BIGINT). They are just the default InnoDB indices.
I'm trying to get the ID next to a certain time, I get WAY different performance with different values and different ORDER BYs. IDs and unixtimes both have a positive association, since they both are increasing in order as more data gets inserted, so it seems like it would be okay to safely omit ordering on unixtime. My table has around 25 million records and performance is extremely vital.
This query is fairly slow (~0.5 seconds): Edit: after using USE INDEX(unixtime)
, I was able to increase performance quite a bit (<0.01 seconds!).
SELECT
id
FROM
devicelog
USE INDEX(unixtime) /* edit: looking at the EXPLAIN, I can use this index and it sped things up a bit */
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1397166634707 /* a fairly recent time */
/* with no ORDER BY clause, this query is surprisingly slow */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6667, Using intersect(field_id,device_id); Using where
This query is extremely fast (<0.01 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1397166634707 /* a fairly recent time */
ORDER BY unixtime ASC /* <- using unixtime to order */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, range, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 897776, Using index condition; Using where
How would omitting an ORDER BY decrease performance? It seems logical to think that it would increase speed.
Yet, if I change the unixtime to something far back, to "1", it will completely slow down when I use the ORDER BY unixtime. I believe the unixtime index is ordered ascendingly, so this doesn't make much sense either.
This query performs in an opposite manner as the queries above.
Extremely fast (<0.01 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1 /* a long time ago */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6742, Using intersect(field_id,device_id); Using where
This query is the exact same as the fast one, except it's using an older time:
EXTREMELY slow (~7 seconds):
SELECT
id
FROM
devicelog
WHERE
device_id = 26
AND field_id = 64
AND unixtime >= 1 /* a long time ago */
ORDER BY unixtime ASC /* <- using unixtime to order */
LIMIT 1
EXPLAIN:
1, SIMPLE, devicelog, index, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 3504, Using where
Does anyone have any insight on the vast performance differences?
Upvotes: 2
Views: 508
Reputation: 10336
I think it's documented behavior of LIMIT optimization, see http://dev.mysql.com/doc/refman/5.5/en/limit-optimization.html
Optimizing LIMIT Queries
MySQL sometimes optimizes a query that has a LIMIT row_count clause and no HAVING clause:
[...] If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.
[...]
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
Because you're trying to get the ID next to a certain date, I would think ordering the result very vital, because else you can get an arbitrary value. Else you've got to use MIN(id) with your conditions to get the desired id value.
Upvotes: 1
Reputation: 108651
It's hard to make clear suggestions about performance without knowing stuff like the number of rows in your table, and the exact structure of the table.
You might try a compound covering index on (unixtime, device_id, file_id, id)
. (Look up covering index if you don't know that term).
This will allow the unixtime
part of your query to be satisfied with BTREE lookup, then the rest of your query can be satisfied with an index scan.
If you specify ORDER BY unixtime ASC LIMIT 1
you're telling the query engine to stop scanning that index (which is ordered by unixtime
as soon as it gets a single hit.
I don't know why it sometimes keeps going on the scan for seven seconds when you omit the ORDER BY. It's possible it has to hunt for the matching device_id
and file_id
values.
Upvotes: 1