Christian K.
Christian K.

Reputation: 538

MySQL Query to retrieve results based on multiple Tags

If have to deal with an existing database structure and try to find an efficient way to select files based on their Tags. One table has the "Files", one has the "Tag Description" and the third one holds all "Tags" related to a file.

How can I select all files with: Language = 'ENG' and Measure = 'METRIC' and Category = 'Type1'? (So result should be one File: ID 100).

This is a simplified version of the 3 tables concerned:

CREATE TABLE IF NOT EXISTS `files` (
  `file_id` int(11) NOT NULL DEFAULT '0',
  `file_name` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `files` (`file_id`, `file_name`) VALUES
(100, 'testfile_1'),
(200, 'testfile_2'),
(300, 'testfile_3'),
(400, 'testfile_4');


CREATE TABLE IF NOT EXISTS `tag_parents` (
  `parent_id` int(11) NOT NULL DEFAULT '0',
  `parent_name` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `tag_parents` (`parent_id`, `parent_name`) VALUES
(1, 'Language'),
(2, 'Measure'),
(3, 'Category');


CREATE TABLE IF NOT EXISTS `tags` (
  `tag_id` int(11) NOT NULL DEFAULT '0',
  `file_id` int(11) DEFAULT NULL,
  `tag_parent_id` int(11) DEFAULT NULL,
  `tag_value` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `tags` (`tag_id`, `file_id`, `tag_parent_id`, `tag_value`) VALUES
(1, 100, 1, 'ENG'),
(2, 200, 1, 'ENG'),
(3, 300, 1, 'FRA'),
(4, 400, 1, 'DEU'),
(5, 100, 2, 'METRIC'),
(6, 200, 2, 'IMPERIAL'),
(7, 300, 2, 'METRIC'),
(8, 400, 2, 'IMPERIAL'),
(9, 100, 3, 'Type1'),
(10, 200, 3, 'Type3'),
(11, 300, 3, 'Type1'),
(12, 400, 3, 'Type1');

Any help is appreciated. Thank you! ( I have failed with all my trials so far or they have been far to slow (with subselects)).

Upvotes: 0

Views: 206

Answers (1)

wwkudu
wwkudu

Reputation: 2796

It makes me think of meta-models where the properties of an object (file in this case) are not columns but values to be looked up in the tags. It will always be slower than having the columns directly in the table, but you should be able to get it to work reasonably. I see tag_parents as tag_type. The following (fiddle here) should work:

select f.* 
from files f
where exists ( -- it should have the "Category"."Type1"
  select parent_id
  from tag_parents categoryT, tags category 
  where categoryT.parent_name="Category"
    and category.tag_parent_id=categoryT.parent_id
    and category.tag_value="Type1"
    and category.file_id=f.file_id
  )
  and exists ( -- as well as "Language"."ENG"
    select parent_id
    from tag_parents languageT, tags language
    where languageT.parent_name="Language"
      and language.tag_parent_id=languageT.parent_id
      and language.tag_value="ENG"
      and language.file_id=f.file_id
    )
  and exists ( -- as well as "Measure"."METRIC"
    select parent_id
    from tag_parents measureT, tags measure
    where measureT.parent_name="Measure"
      and measure.tag_parent_id=measureT.parent_id
      and measure.tag_value="METRIC"
      and measure.file_id=f.file_id
    )

You could simplify your life by defining some views such as Category, Language and Measure (and whatever other tag_parents you'll have). This will make queries more readable. With three views category, measure and language you could be write:

select * from files f, category c, measure m, `language` l
where f.file_id=c.file_id and c.value="Type1"
  and f.file_id=l.file_id and l.value="ENG"
  and m.file_id=l.file_id and m.value="METRIC";

Or even better, if you have a fixed number of these properties, you could define a view similar to the query at the top, but using outer joins rather than exists, with nullable columns category, measure and language:

create view filesView (file_id, category, measure, `language`) as
  select f.file_id, c.tag_value, m.tag_value, l.tag_value
  from files f
    left outer join (tags c, tag_parents ct) on c.file_id=f.file_id
      and c.tag_parent_id=ct.parent_id
      and ct.parent_name="Category"
    left outer join (tags l, tag_parents lt) on l.file_id=f.file_id
      and l.tag_parent_id=lt.parent_id
      and lt.parent_name="Language"
    left outer join (tags m, tag_parents mt) on m.file_id=f.file_id
      and m.tag_parent_id=mt.parent_id
      and mt.parent_name="Measure";

And then you can write:

select file_id, category, measure, `language` 
from filesView
where category="Type1"
  and `language`="ENG"
  and measure="METRIC";

(Sorry got carried away a bit.)

Upvotes: 3

Related Questions