Imashcha
Imashcha

Reputation: 324

Selecting items according to tags table with synonymous rows

I have three tables: articles, tags and articles_tags(junction).

inside tags I have 3 (relevant) columns: id, name, and alias. The alias column contains the ID of an earlier synonymous tag, if exists (NULL otherwise). There are no loops or chains: all synonymous tags contain the same alias. E.g.:

tags

id    |    name    |    alias
------------------------------------
------------------------------------
8     |   pencil   |    NULL
------------------------------------
------------------------------------
3072  |   pencils  |      8
------------------------------------
------------------------------------
3073  | blue pencil|      8
------------------------------------

(The tags MUST have this feature because they are user-typed.)

Now, let's say I want to perform a search for all articles containing the tags pencils or cats. I would like them to include ALL other synonymous tags (pencil, blue pencil, tomcat, kitten etc.).

The query I came up with is as following:

SELECT * FROM `articles` WHERE id IN
    (SELECT article_id FROM `articles_tags` WHERE id IN
         (SELECT id FROM `tags` WHERE COALESCE(alias, id) IN
            (SELECT id FROM `tags` WHERE name IN
                ("pencils", "cats")
        )
    )
)

I believe it is working, but I think there might a better way then using 4 subqueries with INs (performance is key, because this will be a common search to perform).

I would very much appreciate a guidance towards a better solution.

Thanks.

Bonus Question

What if I need to find articles that contain no other tags than the ones searched for?

That is, Selecting all articles that have only the tags "cats" and "dogs" will return articles whose only tags are either ("cats"), ("dogs"), or ("cats", "dogs").

EDIT: table structures:

id    |    name    |    alias
------------------------------------
...
------------------------------------
8     |   pencil   |    NULL
------------------------------------
...
------------------------------------
3072  |   pencils  |      8 (pencil)
------------------------------------
------------------------------------
3073  | blue pencil|      8 (pencil)
------------------------------------
------------------------------------
6088  |    cats    |     NULL
------------------------------------
------------------------------------
7098  |    dogs    |     NULL
------------------------------------
------------------------------------
7099  |  kittens   |     6088 (cats)
------------------------------------
------------------------------------
7102  |  chiwawa   |     7098 (dogs)
------------------------------------

articles

id    |        title       |    content
----------------------------------------
...
----------------------------------------
1     |   I love writing   |    ...
----------------------------------------
...
----------------------------------------
42    |Tips for pet owners |    ...
----------------------------------------
----------------------------------------
108   |  Drawing my dog    |    ...
----------------------------------------

articles_tags

id    |      article_id      |    tag_id
------------------------------------
...
------------------------------------
19    |   1(I love writing)  |    3072 (pencils)
------------------------------------
------------------------------------
21    |   1(I love writing)  |    3010 (poetry)
------------------------------------
------------------------------------
22    |   1(I love writing)  |    123  (books)
------------------------------------
------------------------------------
34    | 42(Tips for pet ...) |    6088 (cats)
------------------------------------
------------------------------------
35    | 42(Tips for pet ...) |    7098 (dogs)
------------------------------------
...
------------------------------------
78    | 108(Drawing my dog) |     7098 (dogs)
------------------------------------
------------------------------------
78    | 108(Drawing my dog) |     8    (pencil)
------------------------------------

desired output:

Searching for chiwawa, cats should yield "Tips for pet owners", "Drawing my dog". Searching for pencil, dogs should yield "I love writing", "Drawing my dog".

And for the second (bonus) part: Searching for chiwawa, kittens should yield ONLY "Tips for pet owners", and NOT "Drawing my dog" because it also has a pencils tag.

sorry for the amount of text.

Upvotes: 0

Views: 71

Answers (2)

jleach
jleach

Reputation: 7800

You've essentially got two issues here... the first being relational division - selecting tags that apply to a parent record, which in itself can be a bit of fun. Here's two relevant links:

https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (Celko)

The answer in this question provides roughly a dozen different ways to query this (and performance benchmarks using PostgreSQL). Quite informative:

How to filter SQL results in a has-many-through relation

As for "pencil vs. pencils" type tags, I'd suggest handling this via application on input. Create a "filter" of sorts that maps tags to other inherent tags. So, when someone enters a "pencils" tag, it automatically either a) filters that to "pencil", or b) enters both pencils and pencil. This is especially helpful for something like "blue pencil". Given the relative complexity of the relational division problem as stated above, I think you'll find yourself with a much easier to manage data model if "blue pencil" added both "pencil" and "blue pencil" (and perhaps "blue") tags into the database.

There's a certain point where logic should become part of the application domain and stay out of the database. This is a good case of such, I believe. Else you risk trying to force a database engine into something it wasn't really designed to do, and that's no fun at all.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271003

You can just get the article id by doing:

select ata.article_id
from articles_tags ata join
     tags t
     on ata.tag_id = t.id
where t.name in ('pencil', 'cats')
group by ata.article_id
having count(distinct t.name) = 2;

For your second question, you can use conditional aggregation. Here is one method:

select ata.article_id
from articles_tags ata join
     tags t
     on ata.tag_id = t.id
where t.name in ('pencil', 'cats')
group by ata.article_id
having sum(t.name = 'pencil') > 0 and sum(t.name = 'cats') > 0;

You can join in articles to get additional columns if you need them.

Upvotes: 1

Related Questions