Antony Sastre
Antony Sastre

Reputation: 617

How do I get multiple COUNT with multiple JOINS and multiple conditions?

I have SQL (MySQL) that I've can't figure out. The application is using uploaded photos where there are many tagged participants in a photo and there is the possibility to give photos a vote between 1 to 5.

The original query gets all the votes for a photo and orders them by amount of votes and the average of those votes.

Now I need to limit the returned photos by the ones with more than 1 participant. So photos with only 1 participant should not be accounted for.

Simplified schema looks like this.

PHOTOS
----------------------
| id   | title       |
----------------------
| 1    | Fun stuff   |
| 2    | Crazy girls |
| 3    | Single boy  |


PHOTO_VOTES
-------------------------------------------
| photo_id   | grade    | date  | user_id |
-------------------------------------------
| 1          | 3        | …     | 12      |
| 1          | 3        | …     | 12      |
| 2          | 5        | …     | 14      |
| 2          | 4        | …     | 14      |
| 3          | 4        | …     | 15      |
| 3          | 4        | …     | 18      |


PHOTO_PARTICIPANTS
-------------------------
| photo_id   | user_id  |
-------------------------
| 1          | 12       |
| 1          | 21       |
| 1          | 33       |
| 2          | 14       |
| 2          | 33       |
| 3          | 12       |

This is how far I got:

SELECT vote.photo_id,
  COUNT(vote.photo_id) AS vote_count,
  AVG(vote.grade) AS vote_average,
  COUNT(pp.photo_id) AS participant_count

 FROM photo_votes vote

  LEFT JOIN photos p ON (vote.photo_id = p.id)
  LEFT JOIN photo_participants pp ON (pp.photo_id = p.id)

  GROUP BY vote.post_id, 
   HAVING vote_count >= 2
   AND vote_average >= 3
   AND participant_count > 1

  ORDER BY count DESC, average DESC;

Basically what I'm looking for to end up with, excluding the photo with only one participant:

VOTES
-----------------------------------------------------------
| photo_id   | vote_count     | average  | participant_count
-----------------------------------------------------------
| 1          | 2              | 3        | 3
| 2          | 2              | 4.5      | 2

Update

It turned out this is a very inefficient way of trying to do what I want. Gordons answer below did solve the problem, but as soon as I wanted to join fields from the photos table as well, the "cartesian product"-issue became a real problem - it became a very heavy and slow query.

The solution I finally ended up with is adding a cache-field into the photos table keeping track of how many participants are in the photo. In other words I added a 'participant_count' field to 'photos' that is being updated every time a change is made to the participants table. I also run a cron-job regularly to make sure all photos 'participant_count' are properly up-to-date.

Upvotes: 0

Views: 32

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

First, you don't need left joins for this. But that shouldn't affect the results. The problem is that you have a cartesian product, because you have two 1-n relationships to photos: votes and participants.

The proper way to fix this is by using subqueries:

SELECT pv.photo_id, pv.vote_count, pv.vote_average, pp.participant_count
FROM (SELECT pv.photo_id, count(*) AS vote_count, avg(grade) AS vote_average
       FROM photo_votes pv
       GROUP BY pv.photo_id
      ) pv 
JOIN
      (SELECT pp.photo_id, count(*) AS participant_count
       FROM photo_participants p;
       GROUP bY pv.photo_id
      ) pp
      ON pv.photo_id = pp.photo_id
WHERE pv.vote_count >= 2 AND
      pv.vote_average >= 3 AND
      pp.participant_count > 1
ORDER BY pv.vote_count DESC, pv.vote_average DESC;

Note that you don't even need the photos table, because you are not using any fields in it.

Upvotes: 2

Related Questions