Reputation: 398
I have 2 large mysql tables: Articles and ArticleTopics. I want to query the DB and retrieve the last 30 articles published for a given topicID. My current query is rather slow. Any ideas on how to improve it?
The tables:
Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| pubDate | datetime | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11) | NO | MUL | NULL | |
| topicId | int(11) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
And my query:
SELECT a.articleId, a.pubDate
FROM Articles a, ArticleTopics t
WHERE t.articleId=a.articleId AND t.topicId=3364
ORDER BY a.pubDate DESC LIMIT 30;
And the EXPLAIN of the query:
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | t | ref | articleId,topicId,topicId_articleId | topicId_articleId | 4 | const | 4281 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,articleId_pubDate | PRIMARY | 4 | t.articleId | 1 | |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
The slowness, I believe, is coming from the ORDER BY a.pubDate DESC
. I can greatly improve performance by faking it a bit by instead doing an ORDER BY t.articleId DESC
and having an index in ArticleTopics
on both articleId
& topicId
, since in general, the articleIds are in the same order as pubDates. They are not always, however, so it's not ideal. I'd like to be able to sort it on the pubDate.
Update: Added EXPLAIN.
Upvotes: 0
Views: 3344
Reputation: 238246
You can rewrite the query in various ways to see if it speeds things up:
SELECT a.articleId, a.pubDate
FROM Articles a
WHERE a.articleId in (
select articleId
from ArticleTopics
where topicId = 3364
)
ORDER BY a.pubDate DESC LIMIT 30;
Or:
SELECT a.articleId, a.pubDate
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364
ORDER BY a.pubDate DESC LIMIT 30;
The important index for both queries is on Articles, and contains articleId as first field.
If article is a large table, with say the entire PDF in binary, you can create an index that fully covers the query. Full coverage means all selected fields are part of the index. For this query, a fully covering index would be (articleId, pubDate).
Upvotes: 1
Reputation: 179179
At this point, do you have an index on topicId
? If so, does the index contain only the topicId
field?
And maybe you can post the output of the EXPLAIN
query.
Upvotes: 0