Reputation: 34598
I have an SQL setup akin to the following:
ARTICLES
TAGS
...and a third table logging associations between the two, since there can be multiple tags to each article:
ARTICLE_TAG_ASSOCS
Via this question I managed to construct a query that would find articles that were tagged with at least one of a number of tags, e.g.
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata ON articles.id = ata.article_id
JOIN tags ON tags.id = ata.tag_id
WHERE tags.tag = 'budgie' OR tags.tag = 'parrot';
Question: How can I alter the above to find articles that match ALL tags, i.e. both 'budgie' and 'parrot', not just one?
Clearly modifying the logic to
WHERE tags.tag = 'budgie' && tags.tag = 'parrot';
...is logically flawed, since MySQL is considering each tag in isolation, one at a time, but hopefully you get what I mean.
Upvotes: 1
Views: 2427
Reputation: 108500
There are several workable approaches.
One approach is to perform separate JOIN operations for each tag. For example:
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata
ON ata.article_id = articles.id
JOIN tags ta
ON ta.id = ata.tag_id
AND ta.tag = 'budgie'
JOIN article_tag_assocs atb
ON atb.article_id = articles.id
JOIN tags tb
ON tb.id = atb.tag_id
AND tb.tag = 'parrot'
Note that this can return "duplicate" rows if a given articles is associated to the same tag value more than once. (Adding the DISTINCT
keyword or a GROUP BY
clause are ways to eliminate the duplicates.)
Another approach, if we are guaranteed that a given article has no duplicate tag values, is to use an inline view to get the list of article_id
that are associated with both tags, and then JOIN that set to the articles
table. For example:
SELECT a.*
FROM ( SELECT ata.article_id
FROM article_tag_assocs ata
JOIN tags t
ON t.id = ata.tag_id
WHERE t.tag IN ('budgie','parrot')
GROUP BY ata.article_id
HAVING COUNT(1) = 2
) s
JOIN articles a
ON a.id = s.article_id
Note that the literal "2" in the HAVING clause matches the number of values in the predicate on the tag
column. The inline view (aliased as s
) returns a distinct list of article_id
, and we can join that to the articles
table.
This approach is useful if you wanted to match, for example, at least three out of four tags. We could use lines like this in the inline view query.
WHERE t.tag IN ('fee','fi','fo','fum')
HAVING COUNT(1) >= 3
Then, any article that matched at least three of those four tags would be returned.
These aren't the only ways to return the specified result, there are several other approaches.
As Roland's answer pointed out, you can also do something like this:
FROM articles a
WHERE a.id IN ( <select article id values related to tag 'parrot'> )
AND a.id IN ( <select article id values related to tag 'bungie'> )
You could also use an EXISTS clause with a correlated subquery, though this approach doesn't usually perform as well with large sets, due to the number of executions of the subquery
FROM articles a
WHERE EXISTS ( SELECT 1
FROM article_tag_assocs s1
JOIN tags t1 ON t1.tag = 'bungie'
WHERE s1.article_id = a.id
)
AND EXISTS ( SELECT 1
FROM article_tag_assocs s2
JOIN tags t2 ON t2.tag = 'parrot'
WHERE s2.article_id = a.id
)
NOTE: in this case, it is possible to reuse the same table aliases within each subquery, because it doesn't lead to ambiguity, though I still prefer distinct aliases because the table aliases show up in the EXPLAIN output, and the distinct aliases make it easier to match the rows in the EXPLAIN output to the references in the query.)
Upvotes: 8