bodacydo
bodacydo

Reputation: 79449

What would be the fastest way to speed up a GROUP BY type of query in MySQL?

I've a table of articles, a table of authors, and a table that maps articles to authors.

I'm doing the following query to find out the authors with the most articles:

SELECT a.*, count(*) c
FROM articleAuthors aa
LEFT JOIN authors a ON aa.author_id=a.id
GROUP BY (author_name)
ORDER BY c DESC LIMIT 50

However this query takes a whole minute to complete. The database has about 1,000,000 records in articles_to_authors table.

How could I speed up this GROUP BY query?

Upvotes: 1

Views: 108

Answers (1)

DRapp
DRapp

Reputation: 48149

Under an assumption of the articleAuthors table having more than 50 distinct authors, I would pre-query just that component and limit to the 50 records you want. Ensure an index exists on (author_id). Also, ensure your authors table has an index on (id). Change your query to

select
      a.*,
      JustAuthorIDs.cntPerAuthor
   from
      ( select 
              aa.author_id, 
              count(*) cntPerAuthor
           from
              articleAuthors aa
           group by 
              aa.author_id
           order by
              cntPerAuthor DESC
           limit 50 ) JustAuthorIDs
      JOIN Authors a
         on JustAuthorIDs.author_ID = a.id

The order by count descending in the prequery will pre-flush AND be pre-ordered by largest count first and stop after 50 records. Then, a simple join to the authors table to get the name and whatever else.

I have the group by based on the author_ID instead of the name as what if you have two authors called "bill board"... The actual ID will be distinct between the two of them.

Now, with the above being a query, you will always be required to query through all million records every time. For something like this, it would PROBABLY be better to add a single "AuthoredItems" column in the authors table. Then, via a trigger on the authorArticles table, when an entry gets added or deleted, just update the final count for the one author on the author table. Then, build an index on the "AuthoredItems" column. Then, you can super simplify the query by doing

select a.*
   from authors a
   order by a.AuthoredItems
   limit 50

Upvotes: 3

Related Questions