silgon
silgon

Reputation: 7211

Finding closest value. How to tell MySQL that the data is already ordered?

Let's say I have a table like the following:

+-----------+------------+------+-----+---------+
| Field     | Type       | Null | Key | Default |
+------------+------------+------+-----+---------+
| datetime   | double     | NO   | PRI | NULL    |
| some_value | float      | NO   |     | NULL    |
+------------+------------+------+-----+---------+

Date is necessary to be in double and is registered in unix time with fractional seconds (no possibility to install mysql 5.6 to use fractional DATETIME). In addition, the values of the field datetime are not only primary, they are also always increasing. I would like to find the closest row to certain value. Usually you can use something like:

select * from table order by abs(datetime - $myvalue) limit 1

However, I'm afraid that this implementation will be slow for hundred thousands of values, because it is going to search in all the database. And since I have an ordered list, I know I can do some binary search to speed up the process, but I have no idea how to tell MySQL to perform such kind of search.

In order to test the performance I do the following lines:

SET profiling = 1;
SELECT * FROM table order by abs(datetime - $myvalue) limit 1;
SHOW PROFILE FOR QUERY 1;

With the following results:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000122 |
| Waiting for query cache lock   | 0.000051 |
| checking query cache for query | 0.000191 |
| checking permissions           | 0.000038 |
| Opening tables                 | 0.000094 |
| System lock                    | 0.000047 |
| Waiting for query cache lock   | 0.000085 |
| init                           | 0.000103 |
| optimizing                     | 0.000031 |
| statistics                     | 0.000057 |
| preparing                      | 0.000049 |
| executing                      | 0.000023 |
| Sorting result                 | 2.806665 |
| Sending data                   | 0.000359 |
| end                            | 0.000049 |
| query end                      | 0.000033 |
| closing tables                 | 0.000050 |
| freeing items                  | 0.000089 |
| logging slow query             | 0.000067 |
| cleaning up                    | 0.000032 |
+--------------------------------+----------+

Which in my understanding, the sorting the result takes 2.8 seconds, however my data is already sorted. As additional information, I have around 240,000 rows.

Upvotes: 0

Views: 45

Answers (3)

user207421
user207421

Reputation: 310909

It won't scan the entire database. A primary key is indexed by a B-tree. Forcing it into a binary search would be slower, if you could do it, which you can't.

Upvotes: 1

Sunil Singhal
Sunil Singhal

Reputation: 603

Indexes are supported in RDBMSs. Define an index on date time or field of your interest and db will not do the complete table scan

Upvotes: 0

Tomasz Maciejewski
Tomasz Maciejewski

Reputation: 513

Try making it a field:

select abs(datetime - $myvalue) as date_diff, table.*
from table
order by date_diff
limit 1

Upvotes: 0

Related Questions