Reputation: 1331
Consider the following query:
select *
from `cg_reviews`
inner join `cg_park` on `cg_park`.`cg_id` = `cg_reviews`.`cgr_cg_id`
where `status` = 'approved'
and `cgr_approval_time` > 0
group by `cg_park`.`cg_id`
order by `cgr_approval_time` desc limit 3
It basically asks for all approved reviews of a park, ordered by approval time and only 1 review per park. Meaning, I don't want more than 1 review per park, so each of the 3 results must be from a different park (cg_id).
This query works, but even with every index I can think of, is still slow over 18687 rows. When I remove the group by clause, it is lightning fast. How else can I do this query with unique parks?
Upvotes: 0
Views: 151
Reputation: 48139
Based on your comment about APPROVED reviews, I have to assume the "status" column is on the reviews table. It is best to always have table.column or alias.column so others don't have to guess which column belongs where.
That said, I would have a multiple column index to help the order by and grouping. Specifically
table index
cg_reviews ( status, cgr_cg_id, cgr_approval_time )
Your park table cg_id = reviews cgr_cg_id, so since they will be the same value, your group by could be "cgr_cg_id" and take advantage of the index.
Without seeing data, you could have 100 reviews for each "cgr_cg_id" and you only care about the most recent... and from that, you only want the top 3. This part of the query has no bearing on the PARK table and can be done via
select
r.cgr_cg_id,
max( r.cgr_approval_time ) latestReview,
from
cg_reviews r
where
r.status = 'approved'
and r.cgr_approval_time > 0
group by
r.cgr_cg_id
order by
max( r.cgr_approval_time ) DESC
limit 3
So now you get 3 records out of all the reviews. The group by is the park ID column and since ordering by the most recent max time per group, you are not getting the same park with multiple reviews. Finally you need the details to present to the end-user. This should be quick as we can now join back to the park and review table on just these three record and should be virtually instantaneous by utilizing same index. Just make the above query your first from query giving it an alias
select
r2.*,
p.*
from
( select
r.cgr_cg_id,
max( r.cgr_approval_time ) latestReview,
from
cg_reviews r
where
r.status = 'approved'
and r.cgr_approval_time > 0
group by
r.cgr_cg_id
order by
max( r.cgr_approval_time ) DESC
limit 3 ) PQ
JOIN cg_reviews r2
on r2.status = 'approved'
AND PQ.cgr_cg_id = r2.cgr_cg_id
AND PQ.latestReview = r2.cgr_approval_time
JOIN cg_park p
on PQ.cgr_cg_id = p.cg_id
Upvotes: 1
Reputation: 781255
it would be better to do the grouping in the larger table before joining rather than after.
SELECT *
FROM (
SELECT *
FROM cg_reviews
WHERE cgr_approval_time > 0
AND status = 'approved'
GROUP BY cgr_cg_id
) AS cgr
INNER JOIN cg_park AS cgp ON cgp.cg_id = cgr.cgr_cg_id
ORDER BY cgr_approval_time DESC
LIMIT 3
Make sure there's an index on cgr_cg_id
(if it's declared as a FOREIGN KEY
the index will be created automatically).
Note that this grouping will not produce reliable results. Since the rows of cg_reviews
are not uniquely determined by the column you're grouping by, you'll get columns from arbitrary rows in each group. You should probably see SQL select only rows with max value on a column for how to select a well-defined row in each group (e.g. the row with the most recent approval time).
Upvotes: 0