Mitya
Mitya

Reputation: 34598

MySQL - match all tags rather than any

I have an SQL setup akin to the following:

ARTICLES

TAGS

...and a third table logging associations between the two, since there can be multiple tags to each article:

ARTICLE_TAG_ASSOCS

Via this question I managed to construct a query that would find articles that were tagged with at least one of a number of tags, e.g.

SELECT articles.*
FROM articles
JOIN article_tag_assocs ata ON articles.id = ata.article_id
JOIN tags ON tags.id = ata.tag_id
WHERE tags.tag = 'budgie' OR tags.tag = 'parrot';

Question: How can I alter the above to find articles that match ALL tags, i.e. both 'budgie' and 'parrot', not just one?

Clearly modifying the logic to

WHERE tags.tag = 'budgie' && tags.tag = 'parrot';

...is logically flawed, since MySQL is considering each tag in isolation, one at a time, but hopefully you get what I mean.

Upvotes: 1

Views: 2427

Answers (1)

spencer7593
spencer7593

Reputation: 108500

There are several workable approaches.

One approach is to perform separate JOIN operations for each tag. For example:

SELECT articles.*
  FROM articles

  JOIN article_tag_assocs ata
    ON ata.article_id = articles.id
  JOIN tags ta
    ON ta.id = ata.tag_id
   AND ta.tag = 'budgie'

  JOIN article_tag_assocs atb
    ON atb.article_id = articles.id
  JOIN tags tb
    ON tb.id = atb.tag_id
   AND tb.tag = 'parrot'

Note that this can return "duplicate" rows if a given articles is associated to the same tag value more than once. (Adding the DISTINCT keyword or a GROUP BY clause are ways to eliminate the duplicates.)


Another approach, if we are guaranteed that a given article has no duplicate tag values, is to use an inline view to get the list of article_id that are associated with both tags, and then JOIN that set to the articles table. For example:

SELECT a.*
  FROM ( SELECT ata.article_id
           FROM article_tag_assocs ata
           JOIN tags t
             ON t.id = ata.tag_id
          WHERE t.tag IN ('budgie','parrot')
          GROUP BY ata.article_id
         HAVING COUNT(1) = 2
       ) s
   JOIN articles a
     ON a.id = s.article_id 

Note that the literal "2" in the HAVING clause matches the number of values in the predicate on the tag column. The inline view (aliased as s) returns a distinct list of article_id, and we can join that to the articles table.

This approach is useful if you wanted to match, for example, at least three out of four tags. We could use lines like this in the inline view query.

          WHERE t.tag IN ('fee','fi','fo','fum')

         HAVING COUNT(1) >= 3

Then, any article that matched at least three of those four tags would be returned.

These aren't the only ways to return the specified result, there are several other approaches.


As Roland's answer pointed out, you can also do something like this:

   FROM articles a
  WHERE a.id IN ( <select article id values related to tag 'parrot'> )
    AND a.id IN ( <select article id values related to tag 'bungie'> )

You could also use an EXISTS clause with a correlated subquery, though this approach doesn't usually perform as well with large sets, due to the number of executions of the subquery

  FROM articles a
  WHERE EXISTS ( SELECT 1
                   FROM article_tag_assocs s1
                   JOIN tags t1 ON t1.tag = 'bungie'
                  WHERE s1.article_id = a.id
               )    
    AND EXISTS ( SELECT 1
                   FROM article_tag_assocs s2
                   JOIN tags t2 ON t2.tag = 'parrot'
                  WHERE s2.article_id = a.id
               )

NOTE: in this case, it is possible to reuse the same table aliases within each subquery, because it doesn't lead to ambiguity, though I still prefer distinct aliases because the table aliases show up in the EXPLAIN output, and the distinct aliases make it easier to match the rows in the EXPLAIN output to the references in the query.)

Upvotes: 8

Related Questions