Reputation: 160
There are two tables: posts
(~5,000,000 rows) and relations
(~8,000 rows).
posts
columns:
-------------------------------------------------
| id | source_id | content | date (int) |
-------------------------------------------------
relations
columns:
---------------------------
| source_id | user_id |
---------------------------
I wrote a MySQL query for getting 10 most recent rows from posts
which are related to a specific user:
SELECT p.id, p.content
FROM posts AS p
LEFT JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
However, it takes ~30 seconds to execute it.
I already have indexes at relations
for (source_id, user_id)
, (user_id)
and for (source_id)
, (date)
, (date, source_id)
at posts
.
EXPLAIN
results:
How can I optimize the query?
Upvotes: 2
Views: 2340
Reputation: 10216
I would try with a composite index on relations :
INDEX source_user (user_id,source_id)
and change the query to this :
SELECT p.id, p.content
FROM posts AS p
INNER JOIN relations AS r
ON ( r.user_id = 1 AND r.source_id = p.source_id )
ORDER BY p.date DESC
LIMIT 10
Upvotes: 0
Reputation: 94914
Your WHERE clause renders your outer join a mere inner join (because in an outer-joined pseudo record user_id will always be null, never 1).
If you really want this to be an outer join then it is completely superfluous, because every record in posts
either has or has not a match in relations
of course. Your query would then be
select id, content
from posts
order by "date" desc limit 10;
If you don't want this to be an outer join really, but want a match in relations
, then we are talking about existence in a table, an EXISTS
or IN
clause hence:
select id, content
from posts
where source_id in
(
select source_id
from relations
where user_id = 1
)
order by "date" desc
limit 10;
There should be an index on relations(user_id, source_id)
- in this order, so we can select user_id
1 first and get an array of all desired source_id
which we then look up.
Of course you also need an index on posts(source_id)
which you probably have already, as source_id
is an ID. You can even speed things up with a composite index posts(source_id, date, id, content)
, so the table itself doesn't have to be read anymore - all the information needed is in the index already.
UPDATE: Here is the related EXISTS
query:
select id, content
from posts p
where exists
(
select *
from relations r
where r.user_id = 1
and r.source_id = p.source_id
)
order by "date" desc
limit 10;
Upvotes: 2
Reputation: 1888
I'd consider the following :-
Firstly, you only want the 10 most recent rows from posts which are related to a user. So, an INNER JOIN
should do just fine.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
The LEFT JOIN
is needed if you want to fetch the records which do not have a relations
mapping. Hence, doing the LEFT JOIN
results in a full table scan of the left table, which as per your info, contains ~5,000,000 rows. This could be the root cause of your query.
For further optimisation, consider moving the WHERE
clause into the ON
clause.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON (r.source_id = p.source_id AND r.user_id = 1)
ORDER BY p.date DESC
LIMIT 10
Upvotes: 0
Reputation: 751
You could put an index on the date column of the posts table, I believe that will help the order-by speed.
You could also try reducing the number of results before ordering with some additional where statements. For example if you know the that there will likely be ten records with the correct user_id today, you could limit the date to just today (or N days back depending on your actual data).
Upvotes: 1
Reputation: 714
Try This
SELECT p.id, p.content FROM posts AS p
WHERE p.source_id IN (SELECT source_id FROM relations WHERE user_id = 1)
ORDER BY p.date DESC
LIMIT 10
Upvotes: 0