SQL: filter a one-to-many query

Question

I have a many-to-many relationship in my DB (but for my question it could as well be a one-to-many hence the title) that links files with tags. The users are able to search the files using the tags as filters.

For example:

+ (a AND b) - c

means the I want all files which are tagged with both the tag a and the tag b, but not those with the tag c.

In pseudocode, that would be something like:

getFiles(filter) {
  let matching_files = [];
  for file in files {
    let tags = file.tags;
    if filter.match(tags) { // something non-trivial
       matching_files.push(file);
    }
  }
  matching_files
}

I just realized that what I want to do is a search based not a number of individual rows, but on a group of rows, which is not really what SQL is meant for.

I've got an idea on how to implement this, which would be by generating dynamic intersect/except based on the search query. So the previous query would be something like:

(SELECT file_id from files, tags WHERE tags.id = a) INTERSECT 
(SELECT file_id from files, tags WHERE tags.id = b) /* <- this handles + (a AND b) */ EXCEPT 
(SELECT file_id from files, tags WHERE tags.id = c) -- this one handles the - c

So + groups would be a bunch of unions, AND keywords would generate INTERSECTs, OR could be a simple OR in the WHERE clause of there group, and - would be an EXCEPT on everything that's been filtered in before.

So my question is this, is there a simpler way to do all of that ? Because first this seems pretty hacky, and second this does not look efficient at all when the number of files grows up, as I would be querying over all files for every group. The solution does not have to be SQL (if there's some other kind of datastorage that would handle this better), but please note I'm doing this for a desktop app (so using something I can embed with my program would be nice). Thanks !

Radim Bača · Accepted Answer

First of all you miss a JOIN condition in your queries such as tags.file_id = files.file_id. You can use EXISTS and NOT EXISTS to do the same job.

SELECT file_id 
FROM files 
WHERE EXISTS (
   SELECT 1 FROM tags 
   WHERE tags.file_id = files.file_id and tags.id = a 
) and EXISTS (
   SELECT 1 FROM tags 
   WHERE tags.file_id = files.file_id and tags.id = b 
) and NOT EXISTS (
   SELECT 1 FROM tags 
   WHERE tags.file_id = files.file_id and tags.id = a 
)

If you have a filter like + (a OR b) - c then it can be expressed using the following sql

SELECT file_id 
FROM files 
WHERE EXISTS (
   SELECT 1 FROM tags 
   WHERE tags.file_id = files.file_id and tags.id IN (a, b) 
) and NOT EXISTS (
   SELECT 1 FROM tags 
   WHERE tags.file_id = files.file_id and tags.id = a 
)

Efficiency is mainly related to the appropriate indexes. If you create index such as tags(id, file_id) then it should be fine (since you probably already have one on file(file_id)).

SQL: filter a one-to-many query

Answers (2)

Related Questions