Anton Sergeyev
Anton Sergeyev

Reputation: 979

Select rows with similar tags (many-to-many relation) from SQL database

There is my entity table and tags bound to it through another table (many-to-many relation).

I have an entity selected and my goal is to find a set of entities which have as much similar tags with it as possible. This set must be ordered by 'similarity' – amount of tags each entity shares with chosen entity. Tags are similar if they have the same ids.

I'm wondering if there's elegant and fast way to do that with single query.

The only solution I see now is to fetch all the tag-entity relations and compute their similarity in my application, and then make another database query to select what i've computed, but it doesn't look very graceful.

Database structure:

entity id ...

tag id name

entity_tag entity_id tag_id

Update: final solution for MySQL.

So I have tables for paintings, tags and painting_tag relation. This query fetches similar paintings and their 'similarity index' for previously selected painting.

SELECT site_painting.*, Count(tr.tag_id) as similarity
From site_painting_tag_relation as tr
Inner Join site_painting_tag_relation as tr2 ON ( tr2.tag_id = tr.tag_id and tr2.painting_id = :id )
Left join site_painting on site_painting.id=tr.painting_id
Where tr.painting_id <> :id
Group By  tr.painting_id
Having Count(*) > 0
Order By  Count(*) DESC, tr.painting_id limit 1

Upvotes: 1

Views: 1802

Answers (1)

donPablo
donPablo

Reputation: 1959

OK, The info on db structure helps a lot --

entity id ...

entity_tag entity_id tag_id

tag id name

Lets look at some example values--

entity
  id=100
  id=...
  id=199

entity_tag
  100, 3
  100, 5
  101, 1
  102, 7
  ...
  199, 3
  199, 7

tag
  id=1
  id=...
  id=10

So if we denormalize the entity_tag we have

  100   3,5
  101  1
  102       7
  199   3   7

And a similarity index of 199 to the rest, on a scale of 0 to 10

  100   1 in common
  101   0 in common
  102   1 in common
  199 self, no comparison

And if I have it right, we want to display 100 and 102 being the highest, nu??

Here is a stab at the SQL-- It might be something like ---

SELECT  TOP 10
FROM
(SELECT
   allET.EID,
   Count(*) as Similarity


 From entity_tag as allET

 Left Join 
      (Select * From  entity_tag  Where EID = myEID ) as myET
 On allET.TID = myET.TID

 Where allET.EID <> myEID

 Group By  allET.EID

 Having Count(*) > 0

 Order By  Count(*) DESC, allET.EID
)

Upvotes: 2

Related Questions