Reputation: 324
I have three tables: articles, tags and articles_tags(junction).
inside tags
I have 3 (relevant) columns:
id
, name
, and alias
.
The alias
column contains the ID of an earlier synonymous tag, if exists (NULL
otherwise). There are no loops or chains: all synonymous tags contain the same alias. E.g.:
tags
id | name | alias
------------------------------------
------------------------------------
8 | pencil | NULL
------------------------------------
------------------------------------
3072 | pencils | 8
------------------------------------
------------------------------------
3073 | blue pencil| 8
------------------------------------
(The tags MUST have this feature because they are user-typed.)
Now, let's say I want to perform a search for all articles containing the tags pencils
or cats
. I would like them to include ALL other synonymous tags (pencil, blue pencil, tomcat, kitten etc.).
The query I came up with is as following:
SELECT * FROM `articles` WHERE id IN
(SELECT article_id FROM `articles_tags` WHERE id IN
(SELECT id FROM `tags` WHERE COALESCE(alias, id) IN
(SELECT id FROM `tags` WHERE name IN
("pencils", "cats")
)
)
)
I believe it is working, but I think there might a better way then using 4 subqueries with INs (performance is key, because this will be a common search to perform).
I would very much appreciate a guidance towards a better solution.
Thanks.
What if I need to find articles that contain no other tags than the ones searched for?
That is, Selecting all articles that have only the tags "cats" and "dogs" will return articles whose only tags are either ("cats"), ("dogs"), or ("cats", "dogs").
EDIT: table structures:
id | name | alias
------------------------------------
...
------------------------------------
8 | pencil | NULL
------------------------------------
...
------------------------------------
3072 | pencils | 8 (pencil)
------------------------------------
------------------------------------
3073 | blue pencil| 8 (pencil)
------------------------------------
------------------------------------
6088 | cats | NULL
------------------------------------
------------------------------------
7098 | dogs | NULL
------------------------------------
------------------------------------
7099 | kittens | 6088 (cats)
------------------------------------
------------------------------------
7102 | chiwawa | 7098 (dogs)
------------------------------------
id | title | content
----------------------------------------
...
----------------------------------------
1 | I love writing | ...
----------------------------------------
...
----------------------------------------
42 |Tips for pet owners | ...
----------------------------------------
----------------------------------------
108 | Drawing my dog | ...
----------------------------------------
id | article_id | tag_id
------------------------------------
...
------------------------------------
19 | 1(I love writing) | 3072 (pencils)
------------------------------------
------------------------------------
21 | 1(I love writing) | 3010 (poetry)
------------------------------------
------------------------------------
22 | 1(I love writing) | 123 (books)
------------------------------------
------------------------------------
34 | 42(Tips for pet ...) | 6088 (cats)
------------------------------------
------------------------------------
35 | 42(Tips for pet ...) | 7098 (dogs)
------------------------------------
...
------------------------------------
78 | 108(Drawing my dog) | 7098 (dogs)
------------------------------------
------------------------------------
78 | 108(Drawing my dog) | 8 (pencil)
------------------------------------
desired output:
Searching for chiwawa
, cats
should yield "Tips for pet owners", "Drawing my dog".
Searching for pencil
, dogs
should yield "I love writing", "Drawing my dog".
And for the second (bonus) part:
Searching for chiwawa
, kittens
should yield ONLY "Tips for pet owners", and NOT "Drawing my dog" because it also has a pencils
tag.
sorry for the amount of text.
Upvotes: 0
Views: 71
Reputation: 7800
You've essentially got two issues here... the first being relational division - selecting tags that apply to a parent record, which in itself can be a bit of fun. Here's two relevant links:
https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (Celko)
The answer in this question provides roughly a dozen different ways to query this (and performance benchmarks using PostgreSQL). Quite informative:
How to filter SQL results in a has-many-through relation
As for "pencil vs. pencils" type tags, I'd suggest handling this via application on input. Create a "filter" of sorts that maps tags to other inherent tags. So, when someone enters a "pencils" tag, it automatically either a) filters that to "pencil", or b) enters both pencils and pencil. This is especially helpful for something like "blue pencil". Given the relative complexity of the relational division problem as stated above, I think you'll find yourself with a much easier to manage data model if "blue pencil" added both "pencil" and "blue pencil" (and perhaps "blue") tags into the database.
There's a certain point where logic should become part of the application domain and stay out of the database. This is a good case of such, I believe. Else you risk trying to force a database engine into something it wasn't really designed to do, and that's no fun at all.
Upvotes: 1
Reputation: 1271003
You can just get the article id by doing:
select ata.article_id
from articles_tags ata join
tags t
on ata.tag_id = t.id
where t.name in ('pencil', 'cats')
group by ata.article_id
having count(distinct t.name) = 2;
For your second question, you can use conditional aggregation. Here is one method:
select ata.article_id
from articles_tags ata join
tags t
on ata.tag_id = t.id
where t.name in ('pencil', 'cats')
group by ata.article_id
having sum(t.name = 'pencil') > 0 and sum(t.name = 'cats') > 0;
You can join in articles
to get additional columns if you need them.
Upvotes: 1