James
James

Reputation: 1780

MySQL BETWEEN query - which part uses index?

Say I have a table foo(bar:int) with a normal btree index on bar, and the table contains 100 rows (with bar having values 2 to 101). When running the following query, how does MySQL decide whether to do the >= or the <= first?

SELECT bar from foo where bar BETWEEN 0 AND 1

If it did the >= then it would scan all 100 rows. On the other hand if it did the <= it would do 0 scans. Is there a way to specify which to do first?

This is particularly relevant for me for datetime range queries on very large tables containing years of historic data and where the timeframe requested is close to the current time. If it did the <= first then there would be a huge scan on the many years worth of data. For example:

SELECT * from table 
WHERE instant BETWEEN DATE_SUB(NOW(), INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 1 MINUTE);

Upvotes: 1

Views: 340

Answers (1)

Michail Michailidis
Michail Michailidis

Reputation: 12181

If I understand your question correctly: When a B-tree index is created usually it is a B+tree http://en.wikipedia.org/wiki/B%2B_tree

B+ Tree representation

Wikipedia:"A simple B+ tree example linking the keys 1–7 to data values d1-d7. The linked list (red) allows rapid in-order traversal."

That means that the smallest element in the range is found (in your case the earliest date) in approximately log_b(N) time and then there are k hops from all the leafs of the B+ tree till all the elements in the range are exhausted.

k is the number of elements in the range that exist in the database and not all the possible values, N is the height of the tree (in Wikipedia example it is 2) and b is the branching factor of the tree (in Wikipedia example it is 3)

Edit: Cases:

  • In the case of only foo<=1 it goes in the B+tree and doesn't find anything so we have 0 scans.

  • In the case of only foo>=0 it will not find 0 but the first in it's values and it will go to 2 in your case. Then it will do the 100 scans

  • When you have 'foo 0 between 40' it is like foo<=40 AND foo>=0, so it will go to the first, in your case to 2 and then do (38hopes/scans assuming that all the values are in the database). In other words, they are not executed separately, so they will use the index together!

In general Sql servers have optimizers that can detect ranges and rewrite your queries by putting your ANDs in the right order. They also keep track of the query performance and they decide after estimating cost plans which execution plan to follow. If you have SQL Server you can see all these plans with any query.

Upvotes: 1

Related Questions