Reputation: 339
I have a many-to-many relationship in my DB (but for my question it could as well be a one-to-many hence the title) that links files with tags. The users are able to search the files using the tags as filters.
For example:
+ (a AND b) - c
means the I want all files which are tagged with both the tag a
and the tag b
, but not those with the tag c
.
In pseudocode, that would be something like:
getFiles(filter) {
let matching_files = [];
for file in files {
let tags = file.tags;
if filter.match(tags) { // something non-trivial
matching_files.push(file);
}
}
matching_files
}
I just realized that what I want to do is a search based not a number of individual rows, but on a group of rows, which is not really what SQL is meant for.
I've got an idea on how to implement this, which would be by generating dynamic intersect/except based on the search query. So the previous query would be something like:
(SELECT file_id from files, tags WHERE tags.id = a) INTERSECT
(SELECT file_id from files, tags WHERE tags.id = b) /* <- this handles + (a AND b) */ EXCEPT
(SELECT file_id from files, tags WHERE tags.id = c) -- this one handles the - c
So +
groups would be a bunch of unions, AND
keywords would generate INTERSECT
s, OR
could be a simple OR
in the WHERE
clause of there group, and -
would be an EXCEPT
on everything that's been filtered in before.
So my question is this, is there a simpler way to do all of that ? Because first this seems pretty hacky, and second this does not look efficient at all when the number of files grows up, as I would be querying over all files for every group. The solution does not have to be SQL (if there's some other kind of datastorage that would handle this better), but please note I'm doing this for a desktop app (so using something I can embed with my program would be nice). Thanks !
Upvotes: 0
Views: 1759
Reputation: 10701
First of all you miss a JOIN
condition in your queries such as tags.file_id = files.file_id
. You can use EXISTS
and NOT EXISTS
to do the same job.
SELECT file_id
FROM files
WHERE EXISTS (
SELECT 1 FROM tags
WHERE tags.file_id = files.file_id and tags.id = a
) and EXISTS (
SELECT 1 FROM tags
WHERE tags.file_id = files.file_id and tags.id = b
) and NOT EXISTS (
SELECT 1 FROM tags
WHERE tags.file_id = files.file_id and tags.id = a
)
If you have a filter like + (a OR b) - c then it can be expressed using the following sql
SELECT file_id
FROM files
WHERE EXISTS (
SELECT 1 FROM tags
WHERE tags.file_id = files.file_id and tags.id IN (a, b)
) and NOT EXISTS (
SELECT 1 FROM tags
WHERE tags.file_id = files.file_id and tags.id = a
)
Efficiency is mainly related to the appropriate indexes. If you create index such as tags(id, file_id)
then it should be fine (since you probably already have one on file(file_id)
).
Upvotes: 2
Reputation: 1269503
If you want to use intersect
, you can do:
select file_id
from file_tags
where file_tags.tag_id = a
intersect
select file_id
from file_tags
where file_tags.tag_id = b
intersect
select file_id
from file_tags
where file_tags.tag_id = c;
I would normally do this with group by
and having
:
select file_id
from file_tags
where tag_id in (a, b, c)
having count(*) = 3;
Upvotes: 0