Reputation: 151
I'm scratching my head over a query optimization problem in MariaDB. I have a table with 300 000 000 rows and with an index on the SEND_DATE column, and the following query runs very fast:
SELECT
*
FROM
MESSAGE
WHERE
SEND_DATE >= '2024-01-01 00:00:00'
AND SEND_DATE < '2024-06-01 00:00:00';
Execution time: 0.044 seconds. 🚀 (plus fetch time ~150s)
However, when I increase the date range by just one month, like this:
SELECT
*
FROM
MESSAGE
WHERE
SEND_DATE >= '2024-01-01 00:00:00'
AND SEND_DATE < '2024-07-01 00:00:00';
The query suddenly slows down dramatically – execution time jumps to 466 seconds. 😱 (plus fetch time ~150s)
It seems that MariaDB stops using the index and switches to a full table scan instead.
Interestingly, when I force the index using FORCE INDEX, the query runs fast again, even for much larger date ranges:
SELECT
*
FROM
MESSAGE FORCE INDEX (INDEX_SEND_DATE)
WHERE
SEND_DATE >= '2024-01-01 00:00:00'
AND SEND_DATE < '2024-07-01 00:00:00';
With FORCE INDEX, the query performs as expected. 🚀
How can I configure MariaDB to always prefer the index for this type of query, even for larger date ranges? I’d prefer not to manually force the index with FORCE INDEX in every query.
Any suggestions or insights are greatly appreciated! 😊
DB version: 10.4.12
I made ANALYZE FORMAT=JSON For narrower date range:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 68531,
"table": {
"table_name": "MESSAGE",
"access_type": "range",
"possible_keys": ["INDEX_SEND_DATE"],
"key": "INDEX_SEND_DATE",
"key_length": "5",
"used_key_parts": ["SEND_DATE"],
"r_loops": 1,
"rows": 24805704,
"r_rows": 1.38e7,
"r_total_time_ms": 65458,
"filtered": 100,
"r_filtered": 100,
"index_condition": "MESSAGE.SEND_DATE >= '2024-01-01 00:00:00' and MESSAGE.SEND_DATE < '2024-05-01 00:00:00'"
}
}
}
For wider date range:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 632820,
"table": {
"table_name": "MESSAGE",
"access_type": "ALL",
"possible_keys": ["INDEX_SEND_DATE"],
"r_loops": 1,
"rows": 241889492,
"r_rows": 2.99e8,
"r_total_time_ms": 582567,
"filtered": 24.389,
"r_filtered": 8.9888,
"attached_condition": "MESSAGE.SEND_DATE >= '2024-01-01 00:00:00' and MESSAGE.SEND_DATE < '2024-08-01 00:00:00'"
}
}
}
Upvotes: 3
Views: 84
Reputation: 142503
This is a design consideration built into the Optimizer.
The Optimizer looks in the index's BTree to guestimate the number of rows matching the range. If that is more than some threshold (often about 20%), it will decide that a table scan will be faster. Note: There is no practical way to get a better cutoff.
Why does that matter? Let me explain how an index works (at least in your example).
SELECT *
).Note the back and forth and the many BTree lookups. These cost something; it is hard to know how much.
The alternative to doing a table scan is to look at every row in the table, essentially a sequential read through the B+Tree. No back and forth; no drilling down BTrees.
Conclusion: Any of these cases will make using the index slower than a table scan:
innodb_buffer_pool_size
os too small for the task.See if
SELECT * FROM Messages
WHERE SEND_DATE >= '2024-01-01'
AND SEND_DATE < '2024-08-01'
ORDER BY SEND_DATE;
will work faster. Note that the Optimizer may decide that the ORDER BY
will avoid a sort, thereby changing the decision process it uses.
Upvotes: 0