Reputation: 581
We are developing an local shop reccomendation system and in one of our sql queries we had a problem We want to fetch the companies which all users in same cluster rated , but if any one of the users in the same group doesnt rated the company we wouldnt want to fetch it
SELECT ml_user_clusters.primaryUser,ml_user_clusters.clusterId,ml_ratings.companyId,ml_ratings.rating,ml_user_labels.groupId FROM ml_user_clusters
LEFT JOIN ml_ratings ON ml_ratings.userId = ml_user_clusters.primaryUser
LEFT JOIN ml_company_user_labels ON ml_company_user_labels.companyId = ml_ratings.companyId
LEFT JOIN ml_user_labels ON ml_user_labels.groupId = ml_company_user_labels.labelId
WHERE ml_user_clusters.clusterId = 0
We've started to add a query like in the below but couldnt able to finish it with proper AND clause
And our data is like in the below: So in the result we would like to have only the companies which has groupId=6 because all users in the same cluster(clusterId=0) rated a company with groupId = 6
primaryUser clusterId companyId rating groupId
497 0 135 5 NULL
498 0 135 10 NULL
79 0 135 12 NULL
501 0 135 10 NULL
79 0 85 14 2
79 0 8 4 5
79 0 98 11 5
79 0 3 5 5
497 0 6 7 6
500 0 6 7 6
499 0 29 7 6
497 0 29 7 6
499 0 77 7 6
500 0 29 7 6
498 0 6 7 6
500 0 77 11 6
500 0 130 3 6
498 0 130 3 6
501 0 77 19 6
499 0 6 7 6
79 0 30 1 7
500 0 30 7 7
79 0 48 7 9
79 0 39 1 13
79 0 48 7 13
499 0 6 7 15
497 0 6 7 15
79 0 8 4 15
500 0 6 7 15
79 0 98 11 15
498 0 6 7 15
79 0 3 5 15
79 0 81 7 15
79 0 3 5 17
79 0 82 7 17
79 0 103 7 17
79 0 118 3 17
79 0 63 3 17
501 0 118 7 17
79 0 82 7 19
79 0 118 3 19
79 0 63 3 19
501 0 118 7 19
79 0 39 1 21
79 0 85 14 23
Expected output must be: (Because all unique users in Cluster=0 has rated a company which has GroupID=6 )
primaryUser clusterId companyId rating groupId
497 0 6 7 6
500 0 6 7 6
499 0 29 7 6
497 0 29 7 6
499 0 77 7 6
500 0 29 7 6
498 0 6 7 6
500 0 77 11 6
500 0 130 3 6
498 0 130 3 6
501 0 77 19 6
499 0 6 7 6
Do you have any idea how we can fix that problem?
Upvotes: 0
Views: 80
Reputation: 26784
Something like this should work,you should build a fiddle for better testing.
Explanation: you count distinct users grouped by group id and compare with the total number of distinct users.If the two match it means all users in that respective groupid have voted.
SELECT ml_user_labels.groupId
FROM ml_user_clusters
LEFT JOIN ml_ratings ON ml_ratings.userId = ml_user_clusters.primaryUser
LEFT JOIN ml_company_user_labels ON ml_company_user_labels.companyId = ml_ratings.companyId
LEFT JOIN ml_user_labels ON ml_user_labels.groupId = ml_company_user_labels.labelId
WHERE ml_user_clusters.clusterId = 0
GROUP BY ml_user_labels.groupId
HAVING COUNT(DISTINCT ml_user_clusters.primaryUser) =
(SELECT COUNT(DISTINCT ml_user_clusters.primaryUser)
FROM ml_user_clusters
LEFT JOIN ml_ratings ON ml_ratings.userId = ml_user_clusters.primaryUser
LEFT JOIN ml_company_user_labels ON ml_company_user_labels.companyId = ml_ratings.companyId
LEFT JOIN ml_user_labels ON ml_user_labels.groupId = ml_company_user_labels.labelId
WHERE ml_user_clusters.clusterId = 0)x
Upvotes: 1