Nathan Wiles
Nathan Wiles

Reputation: 926

Filtering tags with COUNT

I have the following query which filters out rows of threads based on a series of constraints:

SELECT *
    FROM threads th 
        JOIN v_threads_with_tags AS twt      /*Only if filtering by tags*/
            ON th.thread_id = twt.thread_id
        JOIN v_friends AS f                  /*Only if filtering to threads from friends*/
            ON th.owner_id = f.friend 
        LEFT JOIN v_visible_threads AS v        
            ON v.thread_id = th.thread_id
    WHERE (v.viewer_id = 43 || v.viewer_id IS NULL) 
    && (v.friend_id = 43 || v.friend_id IS NULL)      
    && user = 43;                          
    && tag_name IN ('foo','bar')

Parts of the query I haven't had a chance to test, but I can say for sure that tag names are not being filtered completely. This query as is will return a result set similar to this (only relevant columns listed):

thread_id | tag_name
    1         foo
    1         bar
    2         foo
    3         foo

I want a result set of only thread_id's which are linked to ALL tags listed in the query, and obviously I can't have duplicates. In the example above, I would want a result set of only one instance of thread #1.

I've seen similar questions asked on stackoverflow (please read on before marking this as a duplicate), and while the solutions offered vary greatly, the general route seems to often involve adding the following statement to the end of the query:

HAVING COUNT('tag_name') = 2

I've also tried the following similar query:

SELECT th.thread_id,th.owner_id,th.message,th.time,tag_name,viewer_id,v.friend_id
    FROM threads th 
        LEFT JOIN v_visible_threads AS v
            ON v.thread_id = th.thread_id
    WHERE (v.viewer_id = 43 || v.viewer_id IS NULL) 
    && (v.friend_id = 43 || v.friend_id IS NULL)
    && th.thread_id IN 
    (
        SELECT thread_id FROM v_threads_with_tags
        WHERE tag_name IN ('foo','bar')
        HAVING COUNT(tag_name) = 2
    )

I have trouble understanding the use of COUNT() in either of these. In this result set, column tag_name has four values, so I would expect COUNT(tag_name) to return 4, no matter what the value of tag_name in a particular row might be. This is in fact the value it returns, so the statement causes the query to return an empty set.

Still, I'm seeing this statement being used EVERYWHERE to solve this kind of issue, so I have to assume everyone is using it correctly and that I'm missing something.

Can someone explain to me whether or not I'm understanding COUNT correctly, and what methods I might use to finish my query?

Upvotes: 0

Views: 32

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270301

If you want threads that have all tags, you can't get that using just where clause. Instead, aggregate by the thread_id and count the number of matches for each tag. Return the threads that have at least one match for each tag that you care about:

SELECT th.thread_id
FROM threads th JOIN
     v_threads_with_tags twt      /*Only if filtering by tags*/
     ON th.thread_id = twt.thread_id JOIN
     v_friends f                  /*Only if filtering to threads from friends*/
     ON th.owner_id = f.friend LEFT JOIN
     v_visible_threads AS v        
     ON v.thread_id = th.thread_id
WHERE (v.viewer_id = 43 || v.viewer_id IS NULL) and
      (v.friend_id = 43 || v.friend_id IS NULL) and   
     user = 43
group by th.thread_id
having sum(tag_name = 'foo') > 0 and
       sum(tag_name = 'bar') > 0;               

Upvotes: 1

Related Questions