Osvaldas
Osvaldas

Reputation: 160

MySQL: combination of LEFT JOIN and ORDER BY is slow

There are two tables: posts (~5,000,000 rows) and relations (~8,000 rows).

posts columns:

-------------------------------------------------
|  id  |  source_id  |  content  |  date (int)  |
-------------------------------------------------

relations columns:

---------------------------
|  source_id  |  user_id  |
---------------------------

I wrote a MySQL query for getting 10 most recent rows from posts which are related to a specific user:

SELECT      p.id, p.content
FROM        posts AS p
LEFT JOIN   relations AS r
ON          r.source_id = p.source_id
WHERE       r.user_id = 1
ORDER BY    p.date DESC
LIMIT       10

However, it takes ~30 seconds to execute it.

I already have indexes at relations for (source_id, user_id), (user_id) and for (source_id), (date), (date, source_id) at posts.

EXPLAIN results:

EXPLAIN results

How can I optimize the query?

Upvotes: 2

Views: 2340

Answers (5)

Thomas G
Thomas G

Reputation: 10216

I would try with a composite index on relations :

INDEX source_user (user_id,source_id)

and change the query to this :

SELECT      p.id, p.content
FROM        posts AS p
INNER JOIN   relations AS r 
ON ( r.user_id = 1 AND r.source_id = p.source_id )
ORDER BY    p.date DESC
LIMIT       10

Upvotes: 0

Thorsten Kettner
Thorsten Kettner

Reputation: 94914

Your WHERE clause renders your outer join a mere inner join (because in an outer-joined pseudo record user_id will always be null, never 1).

If you really want this to be an outer join then it is completely superfluous, because every record in posts either has or has not a match in relations of course. Your query would then be

select id, content 
from posts 
order by "date" desc limit 10;

If you don't want this to be an outer join really, but want a match in relations, then we are talking about existence in a table, an EXISTS or IN clause hence:

select id, content
from posts
where source_id in
(
  select source_id
  from relations
  where user_id = 1
)
order by "date" desc
limit 10;

There should be an index on relations(user_id, source_id) - in this order, so we can select user_id 1 first and get an array of all desired source_id which we then look up.

Of course you also need an index on posts(source_id) which you probably have already, as source_id is an ID. You can even speed things up with a composite index posts(source_id, date, id, content), so the table itself doesn't have to be read anymore - all the information needed is in the index already.

UPDATE: Here is the related EXISTS query:

select id, content
from posts p
where exists
(
  select *
  from relations r
  where r.user_id = 1
  and r.source_id = p.source_id
)
order by "date" desc
limit 10;

Upvotes: 2

Sarath Chandra
Sarath Chandra

Reputation: 1888

I'd consider the following :-

Firstly, you only want the 10 most recent rows from posts which are related to a user. So, an INNER JOIN should do just fine.

SELECT      p.id, p.content
FROM        posts AS p
JOIN        relations AS r
ON          r.source_id = p.source_id
WHERE       r.user_id = 1
ORDER BY    p.date DESC
LIMIT       10

The LEFT JOIN is needed if you want to fetch the records which do not have a relations mapping. Hence, doing the LEFT JOIN results in a full table scan of the left table, which as per your info, contains ~5,000,000 rows. This could be the root cause of your query.

For further optimisation, consider moving the WHERE clause into the ON clause.

SELECT      p.id, p.content
FROM        posts AS p
JOIN        relations AS r
ON          (r.source_id = p.source_id AND r.user_id = 1)
ORDER BY    p.date DESC
LIMIT       10

Upvotes: 0

Reid Hughes
Reid Hughes

Reputation: 751

You could put an index on the date column of the posts table, I believe that will help the order-by speed.

You could also try reducing the number of results before ordering with some additional where statements. For example if you know the that there will likely be ten records with the correct user_id today, you could limit the date to just today (or N days back depending on your actual data).

Upvotes: 1

Md. Khairul Hasan
Md. Khairul Hasan

Reputation: 714

Try This

    SELECT p.id, p.content FROM posts AS p
    WHERE p.source_id IN (SELECT source_id FROM relations WHERE user_id = 1)
    ORDER BY  p.date DESC
    LIMIT       10

Upvotes: 0

Related Questions