damianb
damianb

Reputation: 1224

"Tag" searching/exclusion query design issue

Background: I'm working on a homebrew project for managing a collection of my own images, and have been trying to implement a tag-based search so I can easily sift through them.

Right now, I'm working with RedBean's tagging API for applying tags to each image's database entry, however I'm stuck on a specific detail of my implementation; currently, to allow search of tags where multiple tags will refine the search (when searching for "ABC XYZ", tagged image must have tags "ABC" and "XYZ"),

I'm having to handle some of the processing in the server-side language and not SQL, and then run an (optional) second query to verify that any returned images don't have a tag that has been explicitly excluded from results. (when searching for "ABC -XYZ", tagged image must have tag "ABC" and not "XYZ").

The problem here is that my current method requires that I run all results by my server-side code, and makes any attempts at sensible pagination/result offsets inaccurate.

My goal is to just grab the rows of the post table that contain the requested tags (and not contain any excluded tags) with one query, and still be able to use LIMIT/OFFSET arguments for my query to obtain reasonably paginated results.

Table schemas is as follows:

Table "post"
Columns:
  id (PRIMARY KEY for post table)
  (image metadata, not relevant to tag search)

Table "tag"
Columns:
  id (PRIMARY KEY for tag table)
  title (string of varying length - assume varchar(255))

Table "post_tag"
Columns:
  id (PRIMARY KEY for post_tag table)
  post_id (associated with column "post.id")
  tag_id (associated with column "tag.id")

If possible, I'd like to also be able to have WHERE conditions specific to post table columns as well.

What should I be using for query structure? I've been playing with left joins but haven't been able to get the exact structure I need to solve this.

Upvotes: 2

Views: 907

Answers (1)

D'Arcy Rittich
D'Arcy Rittich

Reputation: 171371

Here is the basic idea:

The LEFT OUTER JOIN is the set of posts that match tags you want to exclude. The final WHERE clause in the query makes sure that none of these posts match the entries in the first post table.

The INNER JOIN is the set of posts that match all of the tags. Note that the number two must match the number of unique tag names that you provide in the IN clause.

select p.*
from post p
left outer join (
    select pt.post_id    
    from post_tag pt
    inner join tag t on pt.tag_id = t.id
    where t.title in ('UVW', 'XYZ')
) notag on p.id = notag.post_id 
inner join (
    select pt.post_id    
    from post_tag pt
    inner join tag t on pt.tag_id = t.id
    where t.title in ('ABC', 'DEF')
    group by pt.post_id
    having count(distinct t.title) = 2
) yestag on p.id = yestag.post_id 
where notag.post_id is null
--add additional WHERE filters here as needed

Upvotes: 2

Related Questions