Reputation: 5883
I have a large MyISAM table. It's approaching 1 million rows. It's basically a list of items and some information about them.
There are two indices:
I run two queries:
SELECT * FROM table WHERE date = '2011-02-01' AND col < 5 LIMIT 10
SELECT * FROM table WHERE date < '2011-02-01' AND col < 5 LIMIT 10
The first one finishes in ~0.0005 seconds and the second in ~0.05 seconds. That is 100X difference. Is it wrong for me to expect both of these to run at roughly the same speed? I must not be understanding the indices very well. How can I speed up the second query?
Upvotes: 8
Views: 3725
Reputation: 3892
MySQL stores its indexes by default in a BTREE. No hashing in general.
The short answer for the performance difference is that the < form evaluates more nodes then the = form.
The index that you've got on there (date, col) stores the values roughly like a phone book:
2011-01-01, col=1, row_ptr
2011-01-01, col=2, row_ptr
2011-01-01, col=3, row_ptr
etc...
2011-02-01, col=1, row_ptr
2011-02-01, col=2, row_ptr
2011-02-01, col=3, row_ptr
etc...
2011-02-02, col=1, row_ptr
2011-02-02, col=2, row_ptr
etc...
...in ascending sorted tree nodes of size B (2011-01-01, col=1) < (2011-01-01, col=2) < (2011-01-02, col=1).
Your question is essentially asking the difference between:
It should be obvious why #1 is so much faster then #2.
There are also considerations of memory /disk transfer efficiency and heap allocations (= does WAY fewer transfers then <) that account for a not-insignificant amount of time but depend largely on the distribution of the data and the specific location of the 2011-02-01, col=min(col) key record.
Upvotes: 2
Reputation: 49085
Regardless of Mysql it boils down to basic algorithm theory.
Greater than and Less than operations on a large set are slower than Identity operations. With a large data set an ideal data structure for determining less than or greater is a self balancing tree (binary or n-tree). On a a self balanced tree the worst case scenario to find all less/greater is log n.
The ideal data structure for identity lookup is a hashtable. The performance of hashtables is generally O(1) aka fixed time. A hashtable however is not good for greater/less.
Generally a well balanced tree is only slightly less performing than a hashtable (which is how Haskell gets away with using a tree for hashtables).
Thus irregardless of what Mysql does its not surprise that <,> is slower than =
Old Answer below:
Because the first one is like Hashtable lookup since its '=' (particularly if your index is a hashtable) it will be faster than the second one which might work better with a tree like index.
Since MySql allows to configure the index format you can try changing that but I'm rather sure the first will always run faster than the second.
Upvotes: 5
Reputation: 2579
I'm assuming you have an index on the date column. The first query uses the index, the second query probably does a linear scan (at least over part of the data). A direct fetch is always faster than a linear scan.
Upvotes: 2
Reputation: 21
The first one performs a seek over data where as the second one goes for a scan . Scans are always costlier than seeks hence the time difference .
Its like that, the the scan means running through all the pages of the book where as seek is directly jumping to a page number.
Hope this might help.
Upvotes: 1