Reputation: 793
In mysql, I am using "join tables" to assign tags to items. I would like to see which items have the most similar tags to the item being viewed.
For example, let's say the item of interest has been tagged "cool", "cars" and "red". I would like to search for other items with those tags. I want to see items that have been tagged "cars", but I want items that were tagged "cars" and "red" to be above the item that was only tagged "cars". I want items with identical tags to be at the top of the results.
Is there some way to compare a data set (subquery) with another data set (subquery) using IN? Alternately, is there some trick I can use to evaluate them as comma separated lists using GROUP BY and GROUP_CONCAT()?
Upvotes: 1
Views: 377
Reputation: 37813
How about this:
SELECT post, SUM(IF(tag IN ('cool', 'cars', 'red'), 1, 0)) AS number_matching
FROM tags
GROUP BY post
ORDER BY number_matching DESC
Here the list of terms could be filled into the SQL from your application if you have it handy already, or could be generated from a subquery.
Upvotes: 2
Reputation: 24096
It would help if you show us your table structures, so I can be more specific.
I'm assuming you've got a structure that resembles this:
Table item: (id, itemname)
1 item1
2 item2
3 item3
4 item4
5 item5
Table tag: (id, tagname)
1 cool
2 red
3 car
Table itemtag: (id, itemid, tagid)
1 1 2 (=item1, red)
2 2 1 (=item2, cool)
3 2 3 (=item2, car)
4 3 1 (=item3, cool)
5 3 2 (=item3, red)
6 3 3 (=item3, car)
7 4 3 (=item3, car)
8 5 3 (=item3, car)
In general my approach would be to start out by counting each separate tag.
-- make a list of how often a tag was used:
select tagid, count(*) as `tagscore` from itemtag group by tagid
This shows a row for each tag that was assigned to the item, with a score.
In our example, that would be:
tag tagscore
1 2 (cool, 2x)
2 2 (red, 2x)
3 4 (car, 4x)
set @ItemOfInterest=2;
select
itemname,
sum(tagscore) as `totaltagscore`,
GROUP_CONCAT(tags) as `tags`
from
itemtag
join item on itemtag.itemid=item.id
join
/* join the query from above (scores per tag) */
(select tagid, count(*) as `tagscore` from itemtag group by tagid ) as `TagScores`
on `TagScores`.tagid=itemtag.tagid
where
itemid<>@ItemOfInterest and
/* get the taglist of the current item */
tagid in (select distinct tagid from itemtag where itemid=@ItemOfInterest)
group by
itemid
order by
2 desc
Explanation: The query has 2 subqueries: One is to obtain the list tags from the item of interest. We only want to work with those. The other subquery generates a list of scores per tag.
So in the end, each item in the database has a list of tag scores. Those scores are added up with sum(tagscore)
, and that number is used to order the result (highest scores on top).
To show a list of available tags, I've used GROUP_CONCAT.
The query will result in something like this (I've made the actual data up here):
Item TagsScore Tags
item3 15 red,cool,car
item4 7 red,car
item5 7 red
item1 5 car
item6 5 car
Upvotes: 2