Reputation: 123
I have a table of movie ratings that contains millions of rows containing userid's, movieid's and ratings.
| userId | movieId | rating |
------------------------------
| 1 | 213 | 5 |
| 1 | 245 | 4 |
| 2 | 213 | 4 |
| 2 | 245 | 4 |
| 3 | 657 | 5 |
| 3 | 245 | 5 |
I'm trying to figure out a way of grouping together userId's that contain matching sets of movieId's. Ideally I want the query to only find matches if they have at least 5 movieId's in common and if the rating is above 4, but I've simplified it for this example.
In the instance above, userId 1 and 2 would be the only users that match as they both contain the same movieIds. I need a statement that would essentially replicate this. Thanks in advance for any help.
Upvotes: 1
Views: 50
Reputation: 125865
You can perform a self-join on matching movies, filter out records with uninteresting ratings, group by user-pairs and then filter the resulting groups for only those that have at least the requisite number of matching records:
SELECT a.userId, b.userId
FROM myTable a JOIN myTable b USING (movieId)
WHERE a.userId < b.userId
AND a.rating > 4
AND b.rating > 4
GROUP BY a.userId, b.userId
HAVING COUNT(*) >= 5
Upvotes: 2
Reputation: 1455
select movieId, rating
from tablename
group by movieId
having count(userId) > 1 and rating > 4;
this gives me movieId 245 and rating 5, which should be correct according to your provided example data, have more than 1 userId and a rating greater than 4.
Upvotes: 1