Reputation: 11

MySQL date range idex being ignored when a second table added to query

Long time lurker, first time questioner. ;-)

Using PHP 5.6 and MySQL Ver 14.14 Distrib 5.6.41, for Win64 (x86_64) Yeah, I know a little behind the times and we're working on updating. But that's where we are now. ;-)

Updates for questions asked: The index is on the CreateDate. I thought there might be an issue with that column being a DateTime so I created another column that was just a date, set an index on that and retried, but it didn't have any effect.

ulc has 8965 rows total. With index searches 3787 et has 9530 rows. In the query that doesn't use the index, it searches just one row as it's searching on the primary key from the first query.

The formatting of the comparison date doesn't seem to matter. I've tried all sorts of formats, including just straight "2018-01-01 {00:00:00}'. No change.

I've got what I consider a weird one, but I suspect for someone here it's going to be a "duh!" one. I've got a query that includes a date range for the primary table and then goes to get other bits of data from other tables based on a set of unique ids from the first table. Don't worry, I'll have examples below. When I do the search on just the primary table, the range index works as expected and only searches the relevant rows. However, when I add in the next table with the ON clause, it ignores the index and searches all of the rows of the primary table. If I leave off the on clause, it goes back to using the index correctly. I tried using the FORCE INDEX (USE is ignored) and while that makes it use the index, it slows the query way down. Anyway, here are the queries:

Works:

select CreateDate
from ulc
Inner Join et
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  ulc     range   index_CreateDate    index_CreateDate    5   NULL    3787    Using where; Using index
1   SIMPLE  et  index   NULL    index_BankProcessorProfile  5   NULL    9530    Using index; Using join buffer (Block Nested Loop)

Doesn't work:

select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  ulc     ALL     TranID,index_CreateDate     NULL    NULL    NULL    8965    Using where
1   SIMPLE  et  eq_ref  PRIMARY     PRIMARY     8   showpro.ulc.TranID  1   Using index

For the second one I just added the on et.TranID = ulc.TranID

Additionally, if I change it from a range to a specific date, the index works as well.

Upvotes: 0

Answers (3)

Rick James

Reputation: 142503

(I'm going to have to guess at some things, since you have not provided SHOW CREATE TABLE. As a 'long time lurker', you should have realized this.)

First guess is that TranID is not the PRIMARY KEY of ulc?

The solution is to add a "composite" INDEX(CreateDate, TranID) to ulc. (Actually, you should replace the existing INDEX(CreateDate) (Second guess is that you have that index now.)

Now I will try to explain why the first query was happy with INDEX(CreateDate) but the second was not.

In the first query, INDEX(CreateDate) is a "covering" index. That is, this index contains all the columns of ulc that are needed by the SELECT. So, it is almost guaranteed that using the index would be better than scanning the table. It will be a "range index scan" of that index.

The second query needs both CreateDate and TranID, so your index won't be "covering". There are two ways to perform the first part of the query. But first, note that (in InnoDB) a secondary index has all the columns of the PRIMARY KEY (third guess: it is (id)).

Range scan of the index. But, in order to get TranID, it first gets id, then does a lookup in the PRIMARY KEY/data to get TranID`. This process is more costly than simply staying in the index, so the Optimizer does not want to do it unless the estimated number of rows is 'small'.
Since 3787/8965 is not "small", the Optimizer decides that it is probably faster to scan ALL 8965 rows, filtering out the ones not needed.

My proposed index is 'covering', thereby avoiding the bounding back and forth between index and data. So, a range index scan is efficient.

Your observation that switching to a single date made use of the index -- Well, 1 row out of 8965 is 'small', so the index (and the bouncing) is deemed to be the faster way.

As for formatting of the date -- True, it does not matter. This is because the parser notices that STR_TO_DATE("01/01/2018", "%m/%d/%Y") is a constant that can be evaluated once, and does so.

My cookbook should take you directly to the composite index without having to scratch your head over this Question.

Your first query is a "cross join" since it has no ON clause to relate the tables together, and it will return about 35 million rows (9530*3787). The second query will have about 3787 rows, maybe fewer (if some of the joins fail to find a match).

"how little changes between the two queries" -- Never think that! The Optimizer will latch on to seemingly insignificant differences. SELECT CreateDate versus SELECT * -- a huge difference. Most of what I said about the 'first query' would be thrown out. Even changing to SELECT ChangeDate, x would be enough to make a big wrinkle. If the datatypes of TranID in the two tables differed enough, the indexes become useless. Etc, etc.

Upvotes: 0

Mahesh Hegde

Reputation: 1219

if you are using DateTime in your query its suggested to use "YYYY-MM-DD HH:MM:SS" in where class

if you are using Date in your query its suggested to use the format "YYYY-MM-DD" in your where class.you have used STR_TO_DATE("01/01/2018", "%m/%d/%Y") which will typecast to '2018-01-01' seems to be fine

you try to find the complexity of the query using EXPLAIN

explain select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")

you can check if et.TranID and ulc.TranID have proper index or not

Upvotes: 0

Mihai

Reputation: 26804

Just guessing here without more data,but adding a new table to the JOIN changes the data distribution.

So if in the first case the WHERE condition return probably a small(relatively) percentage of the data,in you second case the optimizer decides you`ll get faster results without using the index since the same conditions might not be quite so selective for the new batch of data.

Add the table definitions and a COUNT for both queries,both total and based on your queries,for a better answer.

Upvotes: 0

MySQL date range idex being ignored when a second table added to query

Answers (3)

Related Questions