Reputation: 79449
I've a table of articles, a table of authors, and a table that maps articles to authors.
I'm doing the following query to find out the authors with the most articles:
SELECT a.*, count(*) c
FROM articleAuthors aa
LEFT JOIN authors a ON aa.author_id=a.id
GROUP BY (author_name)
ORDER BY c DESC LIMIT 50
However this query takes a whole minute to complete. The database has about 1,000,000 records in articles_to_authors table.
How could I speed up this GROUP BY query?
Upvotes: 1
Views: 108
Reputation: 48149
Under an assumption of the articleAuthors table having more than 50 distinct authors, I would pre-query just that component and limit to the 50 records you want. Ensure an index exists on (author_id). Also, ensure your authors table has an index on (id). Change your query to
select
a.*,
JustAuthorIDs.cntPerAuthor
from
( select
aa.author_id,
count(*) cntPerAuthor
from
articleAuthors aa
group by
aa.author_id
order by
cntPerAuthor DESC
limit 50 ) JustAuthorIDs
JOIN Authors a
on JustAuthorIDs.author_ID = a.id
The order by count descending in the prequery will pre-flush AND be pre-ordered by largest count first and stop after 50 records. Then, a simple join to the authors table to get the name and whatever else.
I have the group by based on the author_ID instead of the name as what if you have two authors called "bill board"... The actual ID will be distinct between the two of them.
Now, with the above being a query, you will always be required to query through all million records every time. For something like this, it would PROBABLY be better to add a single "AuthoredItems" column in the authors table. Then, via a trigger on the authorArticles table, when an entry gets added or deleted, just update the final count for the one author on the author table. Then, build an index on the "AuthoredItems" column. Then, you can super simplify the query by doing
select a.*
from authors a
order by a.AuthoredItems
limit 50
Upvotes: 3