Flambino
Flambino

Reputation: 18773

How to find complementary tags?

I have a MySQL database with 3 tables:

  1. The main table of records called "tracks" (as in music)
  2. A tags table called "tags"
  3. A join table for the two called "taggings"

The tags table is basically a list of genres, which is pre-defined. A track can then have 1 or more tags (via the join table).

The idea is, that the user checks off the genres (tags) for which he or she want to find tracks. But I would also like the interface to reflect which tags are no longer "useful", that is, tags which are complementary to the currently selected ones.

Edit: What I missed was that I need to find tags that are complementary to the currently selected set of tags. See my comment below.

Example: the user selects the "rock" and "pop" tags, and is shown a list of tracks that match "rock" + "pop". But suppose there are no tracks in the database that also match "jazz". In that case, I'd like to disable the "jazz" tag in the interface, because "rock" + "pop" + "jazz" would give zero results.

Is there a clever way to do this with MySQL?

Upvotes: 1

Views: 329

Answers (4)

Flambino
Flambino

Reputation: 18773

Ended up doing this:

  • When a track is tagged, its tag IDs are concatenated (in order) and added to a 1-column lookup table (and of course also to taggings). E.g. ",1,2,4,6,9," is added to the lookup (there's a reason for the leading/trailing commas)
  • When tags are selected while searching, they're similarly concatenated, and used in a LIKE clause which selects all the concatenations from the lookup table that contain those tag IDs
  • The found concatenations are then processed to get all the IDs that they contain
  • Whichever IDs are not in the resulting list are thus complementary to the selected ones

Upvotes: 0

krdluzni
krdluzni

Reputation: 797

Do you have control over how tags are added to the tracks? If you can hook into that, you can create some extra metadata.

Essentially a many-to-many join between tags. Every time you add a tag, you add a record to this join table containing an old-tag/new-tag pair (with the lower id as the first value to reduce duplication), for every one of the old tags on that track. IIRC, there's a way to set up the table to ignore duplicate inserts, rather than throwing errors. You'll also have to manage this table during deletion of tags, which will be more time-consuming, but that's probably a rare event.

With the above in place you have a simple query to identify tags that are still relevant:

select 
   tag_a
from
   related_tags
where
   tag_b in ([tags_already_in_search])
union
select
   tag_b
from
   related_tags
where
   tag_a in([tags_already_in_search])

This solution essentially shifts some of the processing time to the point in time where tags are added and removed.

Upvotes: 0

scottm
scottm

Reputation: 28699

This probably isn't the most efficient:

SELECT 
    TagId --to disable
FROM Tags 
WHERE TagId NOT IN(
    SELECT Distinct TagId FROM Taggings WHERE TrackId IN (
        SELECT TrackId FROM Taggings WHERE TagId in (1, 2)
    ) 
)
  1. Get all tracks matching currently selected tags.
  2. Take all tags assigned to those tracks
  3. Disable tags not in this list

Edit
What about this:

SELECT 
    DISTINCT TagId 
FROM 
    Taggings
GROUP BY 
    TagId,
    TrackId HAVING TrackId IN (
        SELECT 
            TrackId
        FROM 
            Taggings
        WHERE
         TagId in ( 1, 2)
    )

This should return all tags that should be enabled.

Upvotes: 0

krdluzni
krdluzni

Reputation: 797

select 
   tagid 
from 
   taggings 
where 
   trackid in ([list of, or subquery for your selected tracks])

Disable all the tags, unless the result contains their id. Or disable all tags, then re-enable those that are returned by this query. You could also do some restructuring and convert it to a 'not in' query, but that would normally be slower.

Upvotes: 1

Related Questions