Speed up group by

Question

Consider the following query:

select * 
  from `cg_reviews` 
 inner join `cg_park` on `cg_park`.`cg_id` = `cg_reviews`.`cgr_cg_id` 
 where `status` = 'approved' 
   and `cgr_approval_time` > 0 
 group by `cg_park`.`cg_id` 
 order by `cgr_approval_time` desc limit 3

It basically asks for all approved reviews of a park, ordered by approval time and only 1 review per park. Meaning, I don't want more than 1 review per park, so each of the 3 results must be from a different park (cg_id).

This query works, but even with every index I can think of, is still slow over 18687 rows. When I remove the group by clause, it is lightning fast. How else can I do this query with unique parks?

DRapp · Accepted Answer

Based on your comment about APPROVED reviews, I have to assume the "status" column is on the reviews table. It is best to always have table.column or alias.column so others don't have to guess which column belongs where.

That said, I would have a multiple column index to help the order by and grouping. Specifically

table        index
cg_reviews   ( status, cgr_cg_id, cgr_approval_time )

Your park table cg_id = reviews cgr_cg_id, so since they will be the same value, your group by could be "cgr_cg_id" and take advantage of the index.

Without seeing data, you could have 100 reviews for each "cgr_cg_id" and you only care about the most recent... and from that, you only want the top 3. This part of the query has no bearing on the PARK table and can be done via

select
      r.cgr_cg_id,
      max( r.cgr_approval_time ) latestReview,
   from
      cg_reviews r
   where
          r.status = 'approved'
      and r.cgr_approval_time > 0
   group by
      r.cgr_cg_id
   order by
      max( r.cgr_approval_time ) DESC
   limit 3

So now you get 3 records out of all the reviews. The group by is the park ID column and since ordering by the most recent max time per group, you are not getting the same park with multiple reviews. Finally you need the details to present to the end-user. This should be quick as we can now join back to the park and review table on just these three record and should be virtually instantaneous by utilizing same index. Just make the above query your first from query giving it an alias

select
      r2.*,
      p.*
   from
      ( select
            r.cgr_cg_id,
            max( r.cgr_approval_time ) latestReview,
         from
            cg_reviews r
         where
                r.status = 'approved'
            and r.cgr_approval_time > 0
         group by
            r.cgr_cg_id
         order by
            max( r.cgr_approval_time ) DESC
         limit 3 ) PQ

         JOIN cg_reviews r2
            on r2.status = 'approved'
           AND PQ.cgr_cg_id = r2.cgr_cg_id 
           AND PQ.latestReview = r2.cgr_approval_time

         JOIN cg_park p
            on PQ.cgr_cg_id = p.cg_id

Speed up group by

Answers (2)

Related Questions