Reputation: 11
Long time lurker, first time questioner. ;-)
Using PHP 5.6 and MySQL Ver 14.14 Distrib 5.6.41, for Win64 (x86_64) Yeah, I know a little behind the times and we're working on updating. But that's where we are now. ;-)
Updates for questions asked: The index is on the CreateDate. I thought there might be an issue with that column being a DateTime so I created another column that was just a date, set an index on that and retried, but it didn't have any effect.
ulc has 8965 rows total. With index searches 3787 et has 9530 rows. In the query that doesn't use the index, it searches just one row as it's searching on the primary key from the first query.
The formatting of the comparison date doesn't seem to matter. I've tried all sorts of formats, including just straight "2018-01-01 {00:00:00}'. No change.
I've got what I consider a weird one, but I suspect for someone here it's going to be a "duh!" one. I've got a query that includes a date range for the primary table and then goes to get other bits of data from other tables based on a set of unique ids from the first table. Don't worry, I'll have examples below. When I do the search on just the primary table, the range index works as expected and only searches the relevant rows. However, when I add in the next table with the ON
clause, it ignores the index and searches all of the rows of the primary table. If I leave off the on clause, it goes back to using the index correctly. I tried using the FORCE INDEX
(USE
is ignored) and while that makes it use the index, it slows the query way down. Anyway, here are the queries:
Works:
select CreateDate
from ulc
Inner Join et
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc range index_CreateDate index_CreateDate 5 NULL 3787 Using where; Using index
1 SIMPLE et index NULL index_BankProcessorProfile 5 NULL 9530 Using index; Using join buffer (Block Nested Loop)
Doesn't work:
select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc ALL TranID,index_CreateDate NULL NULL NULL 8965 Using where
1 SIMPLE et eq_ref PRIMARY PRIMARY 8 showpro.ulc.TranID 1 Using index
For the second one I just added the on et.TranID = ulc.TranID
Additionally, if I change it from a range to a specific date, the index works as well.
Upvotes: 0
Views: 236
Reputation: 142503
(I'm going to have to guess at some things, since you have not provided SHOW CREATE TABLE
. As a 'long time lurker', you should have realized this.)
First guess is that TranID
is not the PRIMARY KEY
of ulc
?
The solution is to add a "composite" INDEX(CreateDate, TranID)
to ulc
. (Actually, you should replace the existing INDEX(CreateDate)
(Second guess is that you have that index now.)
Now I will try to explain why the first query was happy with INDEX(CreateDate)
but the second was not.
In the first query, INDEX(CreateDate)
is a "covering" index. That is, this index contains all the columns of ulc
that are needed by the SELECT
. So, it is almost guaranteed that using the index would be better than scanning the table. It will be a "range index scan" of that index.
The second query needs both CreateDate
and TranID
, so your index won't be "covering". There are two ways to perform the first part of the query. But first, note that (in InnoDB) a secondary index has all the columns of the PRIMARY KEY
(third guess: it is (id)
).
TranID
, it first gets id
, then does a lookup in the PRIMARY KEY
/data to get TranID`. This process is more costly than simply staying in the index, so the Optimizer does not want to do it unless the estimated number of rows is 'small'.ALL
8965 rows, filtering out the ones not needed.My proposed index is 'covering', thereby avoiding the bounding back and forth between index and data. So, a range index scan is efficient.
Your observation that switching to a single date made use of the index -- Well, 1 row out of 8965 is 'small', so the index (and the bouncing) is deemed to be the faster way.
As for formatting of the date -- True, it does not matter. This is because the parser notices that STR_TO_DATE("01/01/2018", "%m/%d/%Y")
is a constant that can be evaluated once, and does so.
My cookbook should take you directly to the composite index without having to scratch your head over this Question.
Your first query is a "cross join" since it has no ON
clause to relate the tables together, and it will return about 35 million rows (9530*3787). The second query will have about 3787 rows, maybe fewer (if some of the joins fail to find a match).
"how little changes between the two queries" -- Never think that! The Optimizer will latch on to seemingly insignificant differences. SELECT CreateDate
versus SELECT *
-- a huge difference. Most of what I said about the 'first query' would be thrown out. Even changing to SELECT ChangeDate, x
would be enough to make a big wrinkle. If the datatypes of TranID
in the two tables differed enough, the indexes become useless. Etc, etc.
Upvotes: 0
Reputation: 1219
if you are using DateTime in your query its suggested to use "YYYY-MM-DD HH:MM:SS" in where class
if you are using Date in your query its suggested to use the format "YYYY-MM-DD" in your where class.you have used STR_TO_DATE("01/01/2018", "%m/%d/%Y") which will typecast to '2018-01-01' seems to be fine
you try to find the complexity of the query using EXPLAIN
explain select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
you can check if et.TranID and ulc.TranID have proper index or not
Upvotes: 0
Reputation: 26804
Just guessing here without more data,but adding a new table to the JOIN changes the data distribution.
So if in the first case the WHERE condition return probably a small(relatively) percentage of the data,in you second case the optimizer decides you`ll get faster results without using the index since the same conditions might not be quite so selective for the new batch of data.
Add the table definitions and a COUNT for both queries,both total and based on your queries,for a better answer.
Upvotes: 0