David G
David G

Reputation: 6871

mysql joining efficiency - join with where then join with something else

I have a query that looks like this:

select `adverts`.*
 from `adverts` 
    inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
    inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
    where `advert_location`.`location_id` = ? 
    and `advert_category`.`category_id` = ?
    order by `updated_at` desc

The problem here is I have a huge database and this response is absolutely ravaging my database.

What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.

Doing it as is just isn't viable.

So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?

Thanks

Upvotes: 1

Views: 605

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562230

MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.

You can see the result of join reordering if you use EXPLAIN on your query.

If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:

from `adverts` 
straight_join `advert_category` ...

https://dev.mysql.com/doc/refman/5.7/en/join.html says:

STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.

Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.

There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.

PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.

Upvotes: 0

dnoeth
dnoeth

Reputation: 60462

You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):

select adverts.*
from
  (
   select `adverts`.*
   from `adverts` 
   inner join `advert_category`
     on `advert_category`.`advert_id` =`adverts`.`id`
   where `advert_category`.`category_id` = ?
 ) as adverts
inner join `advert_location`
  on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ? 
order by `updated_at` desc

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269483

This is your query, written a bit simpler so I can read it:

select a.*
from adverts a inner join 
     advert_category ac
     on ac.advert_id = a.id inner join
     advert_location al
     on al.advert_id = a.id
where al.location_id = ? and
      ac.category_id = ?
order by a.updated_at desc;

I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.

A better way to write the query uses exists:

select a.*
from adverts a
where exists (select 1
              from advert_location al
              where al.advert_id = a.id and al.location_id = ?
             ) and
      exists (select 1
              from advert_category ac
              where ac.advert_id = a.id and ac.category_id = ?
             )
order by a.updated_at desc;

For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).

Upvotes: 1

Related Questions