Reputation: 4536
To start of, I have seen: Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
It looks like the same question, but the issue there is lack of indices. So let me clarify my case.
To generalize, I will simplify my two queries:
-- min:
SELECT min(id) FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY;
-- exec time: ~0.260 s
-- order-limit:
SELECT id FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY ORDER BY s_time, id LIMIT 1;
-- exec time: ~0.060 s
Here, id
is the primary key, and s_time
is an indexed timestamp.
Running explain format=json
, shows that the difference between these two queries is that the order-limit version has an ordering_operation that says using_filesort: false
. Both show same query_cost
analysis.
Now, my understanding of this is that if the column is indexed, then it's ordered in a btree. And, that these indexed entries have information of pertaining primary key. Finding the first one (limit 1) should be a simple traversal of the btree, and quite quick.
However, performing MIN(primary_key) FROM foo WHERE indexed_entry > bar
, should be handled in the same way. Is this simply a case of poor optimization by innoDb?
If using LIMIT has a special optimization case where analyzes memory requirements for the number of entries, and if possible uses a priority queue instead of quicksort, shouldn't MIN()
be part of that same use case where it uses LIMIT 1
?
explain
differences:
min-case:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "91987.68"
},
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229128,
"rows_produced_per_join": 229128,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46162.08",
"eval_cost": "45825.60",
"prefix_cost": "91987.68",
"data_read_per_join": "104M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`s_time` >= <cache>((now() - interval 14 day)))"
}
}
}
order-limit
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "92215.71"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229696,
"rows_produced_per_join": 229696,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46276.51",
"eval_cost": "45939.20",
"prefix_cost": "92215.71",
"data_read_per_join": "105M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`started_time` >= <cache>((now() - interval 14 day)))"
}
}
}
}
Interesting related documentation: method bool check_if_pq_applicable()
in https://dev.mysql.com/doc/dev/mysql-server/8.0.0/filesort_8cc.html
DESCRIPTION Given a query like this: SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows; This function tests whether a priority queue should be used to keep the result. Necessary conditions are:
estimate that it is actually cheaper than merge-sort enough memory to store the records.
Upvotes: 0
Views: 143
Reputation: 142433
They do different things, hence one has to work harder.
SELECT min(id)
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY;
Searches all of the items in the last two weeks to find the lowest id
. INDEX(s_time, id)
will help some.
SELECT id
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY
ORDER BY s_time, id
LIMIT 1;
If you have INDEX(stime, id), then it will look at only one row -- the first one of 14 days ago. No scan. No checking to see if it is the smallest
id`.
Note: If you have PRIMARY KEY(id), INDEX(stime)
, then that index is effectively (stime, id)
.
Since you probably inserted the rows in stime
order, the results will probably be the same. But the Optimizer has no way of knowing that.
Upvotes: 1