Gil Cohen
Gil Cohen

Reputation: 846

MySQL JOIN and ORDER BY - performance issue

I have this query that drives me crazy for quite some time. It has 3 tables (originally it has a lot more but I isolated the performance issue), 1 base table, 1 product table which adds more data, and 1 with product types. The product types table contains a "max age" column which indicates the maximum age of a row I want to fetch (anything older is considered "archived") and its value is different according to the product type. My poor performance query goes like this and it takes 50 seconds for a 250,000 rows base table:

(select d_baseservices.ID
from d_baseservices    
inner join d_products on d_baseservices.ServiceID = d_products.ServiceID
inner join md_prodtypes on d_products.ProdType = md_prodtypes.ProdType
where
(d_baseservices.CreationDate > (curdate() - INTERVAL md_prodtypes.MaxAge DAY))
order by CreationDate desc 
limit 750);

Here is the EXPLAIN of this query:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  md_prodtypes    index   PRIMARY,ProdType_UNIQUE,ID_MAX_AGE  MAX_AGE 5       23  Using index; Using temporary; Using filesort
1   SIMPLE  d_products  ref PRIMARY,ServiceID_UNIQUE,fk_Products_BaseServices1,fk_d_products_md_prodtypes1  fk_d_products_md_prodtypes1 4   combina.md_prodtypes.ProdType   8625    
1   SIMPLE  d_baseservices  eq_ref  PRIMARY,CreationDateDesc_index,CreationDate_index   PRIMARY 8   combina.d_products.ServiceID    1   Using where

I found a clue a few days back, when I was able to determine that limiting the query to 750 records would cause is to go fast, but 751 would bring poor performance.

I tried creating indexes of many kinds, with no success. I tried removing the reference to MAX_AGE and the curdate function and just set a fixed value, with little success as the query now takes 20 seconds:

(select d_baseservices.ID
from d_baseservices    
inner join d_products on d_baseservices.ServiceID = d_products.ServiceID
inner join md_prodtypes on d_products.ProdType = md_prodtypes.ProdType
where
(d_baseservices.CreationDate > '2015-09-21 19:02:25')
order by CreationDate desc 
limit 750);

And the EXPLAIN command output:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  md_prodtypes    index   PRIMARY,ProdType_UNIQUE,ID_MAX_AGE  ProdType_UNIQUE 4       23  Using index; Using temporary; Using filesort
1   SIMPLE  d_products  ref PRIMARY,ServiceID_UNIQUE,fk_Products_BaseServices1,fk_d_products_md_prodtypes1  fk_d_products_md_prodtypes1 4   combina.md_prodtypes.ProdType   8625    
1   SIMPLE  d_baseservices  eq_ref  PRIMARY,CreationDateDesc_index,CreationDate_index   PRIMARY 8   combina.d_products.ServiceID    1   Using where\

Can anyone please help? I'm stuck for almost a month

Upvotes: 0

Views: 179

Answers (2)

Rick James
Rick James

Reputation: 142278

With a constant date...

INDEX(CreationDate)

That will encourage the optimizer to start with the table that can be filtered. Also, since the ORDER BY is on the same field, the WHERE, ORDER BY and LIMIT can all be done at the same time.

Otherwise, it must read all the relevant records from all 3 tables, sort them, then deliver 750 (or 751) of them.

Using MAX_AGE...

Now the optimizer won't know whether it is better to do as above or find all the rows, sort them, then deliver the LIMIT.

Upvotes: 0

Oz Solomon
Oz Solomon

Reputation: 3044

It's hard to say exactly what to do without knowing more about the specific data you have (how many rows in each table, how many rows you expect the query to return, the distribution of the data values, etc), but I'll make some educated guesses and hopefully point you in the right direction.

First an explanation about why taking md_prodtypes.MaxAge out of the query greatly reduced the run time: Prior to that change the database had no ability at all to filter using indexes because in order to see if rows are candidates for inclusion it had to join the three tables in order to compare CreationDate from the first table to MaxAge in the third table. There is simply no index that you can add to correlate these two values. You're forcing the database engine to look at every single row.

As to the 750 magic number - I'm guessing that past 750 results the database has to page data or that it's hitting some other memory limit based on the values in your specific MySQL configuration file. I wouldn't read too much into that 750 number.

Lastly I'd like to point out that the EXPLAIN of your second query is a bit strange since it's showing md_prodtypes as the first table despite the fact that you took MaxAge out of the WHERE. That means the database is starting from md_prodtypes then moving up to d_products and finally to d_baseservices and only then filtering based on the date. I'm guessing that you're expecting it to first filter on the date then join only when it's decided what baseservices records to include. It's impossible to know why this is happening with the information you've provided. Perhaps you are missing an index.
Another possibility may have to do with variance in your CreationDate column. Let me explain by example: Say you had a table of users, and each user had a gender column that could be either f or m. Let's pretend that we have a 50%/50% split of females and males. Now, if you add an index on the column gender and do a query filtered by WHERE gender='f' expecting that the index will filter out half of the records, you'd be surprised to see that the database will totally ignore the index and just scan the table. The reason being is that it's cheaper to just read the whole table if you know the index isn't filtering out enough (the alternative being jumping constantly from the index to the main table data). In your case, if the WHERE on the CreationDate column doesn't filter out enough records, then even if you have an index on it, it won't be used.

Upvotes: 2

Related Questions