Reputation: 617
I have SQL (MySQL) that I've can't figure out. The application is using uploaded photos where there are many tagged participants in a photo and there is the possibility to give photos a vote between 1 to 5.
The original query gets all the votes for a photo and orders them by amount of votes and the average of those votes.
Now I need to limit the returned photos by the ones with more than 1 participant. So photos with only 1 participant should not be accounted for.
Simplified schema looks like this.
PHOTOS
----------------------
| id | title |
----------------------
| 1 | Fun stuff |
| 2 | Crazy girls |
| 3 | Single boy |
PHOTO_VOTES
-------------------------------------------
| photo_id | grade | date | user_id |
-------------------------------------------
| 1 | 3 | … | 12 |
| 1 | 3 | … | 12 |
| 2 | 5 | … | 14 |
| 2 | 4 | … | 14 |
| 3 | 4 | … | 15 |
| 3 | 4 | … | 18 |
PHOTO_PARTICIPANTS
-------------------------
| photo_id | user_id |
-------------------------
| 1 | 12 |
| 1 | 21 |
| 1 | 33 |
| 2 | 14 |
| 2 | 33 |
| 3 | 12 |
This is how far I got:
SELECT vote.photo_id,
COUNT(vote.photo_id) AS vote_count,
AVG(vote.grade) AS vote_average,
COUNT(pp.photo_id) AS participant_count
FROM photo_votes vote
LEFT JOIN photos p ON (vote.photo_id = p.id)
LEFT JOIN photo_participants pp ON (pp.photo_id = p.id)
GROUP BY vote.post_id,
HAVING vote_count >= 2
AND vote_average >= 3
AND participant_count > 1
ORDER BY count DESC, average DESC;
Basically what I'm looking for to end up with, excluding the photo with only one participant:
VOTES
-----------------------------------------------------------
| photo_id | vote_count | average | participant_count
-----------------------------------------------------------
| 1 | 2 | 3 | 3
| 2 | 2 | 4.5 | 2
Update
It turned out this is a very inefficient way of trying to do what I want. Gordons answer below did solve the problem, but as soon as I wanted to join fields from the photos table as well, the "cartesian product"-issue became a real problem - it became a very heavy and slow query.
The solution I finally ended up with is adding a cache-field into the photos table keeping track of how many participants are in the photo. In other words I added a 'participant_count' field to 'photos' that is being updated every time a change is made to the participants table. I also run a cron-job regularly to make sure all photos 'participant_count' are properly up-to-date.
Upvotes: 0
Views: 32
Reputation: 1270391
First, you don't need left join
s for this. But that shouldn't affect the results. The problem is that you have a cartesian product, because you have two 1-n relationships to photos: votes and participants.
The proper way to fix this is by using subqueries:
SELECT pv.photo_id, pv.vote_count, pv.vote_average, pp.participant_count
FROM (SELECT pv.photo_id, count(*) AS vote_count, avg(grade) AS vote_average
FROM photo_votes pv
GROUP BY pv.photo_id
) pv
JOIN
(SELECT pp.photo_id, count(*) AS participant_count
FROM photo_participants p;
GROUP bY pv.photo_id
) pp
ON pv.photo_id = pp.photo_id
WHERE pv.vote_count >= 2 AND
pv.vote_average >= 3 AND
pp.participant_count > 1
ORDER BY pv.vote_count DESC, pv.vote_average DESC;
Note that you don't even need the photos
table, because you are not using any fields in it.
Upvotes: 2