Reputation: 49703
Let's say I have the following tables:
id: integer
name: string
id: integer
body: text
id: integer
tag_id: integer
post_id: integer
How would I go about writing a query that select all posts that are tagged with ALL of the following tags (name attribute of tags table): "Cheese", "Wine", "Paris", "Frace", "City", "Scenic", "Art"
See also: Sql query to find things with most specified tags (note: similar, but not a duplicate!)
Upvotes: 13
Views: 5089
Reputation: 146499
Try this:
Select * From Posts p
Where Not Exists
(Select * From tags t
Where name in
('Cheese', 'Wine', 'Paris',
'Frace', 'City', 'Scenic', 'Art')
And Not Exists
(Select * From taggings
Where tag_id = t.Tag_Id
And post_Id = p.Post_Id))
Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.
Upvotes: 2
Reputation: 332571
SELECT p.*
FROM POSTS p
WHERE p.id IN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
SELECT p.*
FROM POSTS p
JOIN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id
SELECT p.*
FROM POSTS p
WHERE EXISTS (SELECT NULL
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
AND tg.post_id = p.id
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
The crux of things is that the COUNT(DISTINCT t.name)
needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.
Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.
Upvotes: 24