Reputation: 217
I have articles on my site, and I would like to add tags which would describe each article, but I'm having problems with design mysql table for tags. I have two ideas:
So when I want tags for article with ID 1, I would run
SELECT ... FROM tags WHERE `article_id`=1;
But, I would also like to know 3 most similar articles by comparing tags, so if I have article which has tags "php,mysql,erlang", and 5 articles with tags: "php,mysql", "erlang,ruby", "php erlang", "mysql,erlang,javascript", I would choose 1., 3. and 4., since those 3 have most same tags with main article.
Also other question, what is the best way to get 10 "most used tags" ?
Upvotes: 8
Views: 6619
Reputation: 409
yes, but you didn't answer my main question, how to get 3 most similar articles?
Answer: Just look for the same tag ids in the merged table (tags_articles). Gather them and create a pattern.
For example: Article 1 has tags: 1,2 Article 2 has tags: 2,3,4 Article 5 has tags: 6,7,2 Article 7 has tags: 7,1,2,3
If you want the 3 most similar articles for article 1, you have to look for the tags 1,2. You'll find Article 7 is most similar and 2 and 5 have some similarities.
Upvotes: 0
Reputation: 60508
First off, you'll want to use Pascal MARTIN's suggestion about the table design.
As for finding similar articles, here's something to get you started. Given that @article_id is the article you want to find matches for, and @tag1, @tag2, @tag3 are the tags for that article:
SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3
Upvotes: 1
Reputation: 400922
Generally, for this kind of many-to-many relationship, there are three tables :
article
" table
tag
" table
tags_articles
" table, which acts as a join table, and contains only :
id_article
: foreign key that points to an articleid_tag
: foreign key that points to a tag
This way, there is no duplication of any tag's data : for each tag, there is one, and only one, line in the tag
table.
And, for each article, you can have several tags (i.e. several lines in the tags_articles
table) ; and, of course, for each tags, you can have several articles.
Getting a list of tags for an article, with this idea, is a matter of an additionnal query, like :
select tag.*
from tag
inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123
Getting the three "most similar" articles would mean :
Not tested, but an idea might be something that would look like this :
select article.id, count(*) as nb_identical_tags
from article
inner join tags_articles on tags_articles.id_article = article.id
inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
and article.id <> 123
group by article.id
order by count(*) desc
limit 3
Basically, you :
where
clause, without the group by
clause, there would be two lines for that articlegroup by article.id
, there will be only one line per article
count
, to find out how many tags each article has in common with the initial oneUpvotes: 18