Reputation: 538
If have to deal with an existing database structure and try to find an efficient way to select files based on their Tags. One table has the "Files", one has the "Tag Description" and the third one holds all "Tags" related to a file.
How can I select all files with: Language = 'ENG' and Measure = 'METRIC' and Category = 'Type1'? (So result should be one File: ID 100).
This is a simplified version of the 3 tables concerned:
CREATE TABLE IF NOT EXISTS `files` (
`file_id` int(11) NOT NULL DEFAULT '0',
`file_name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `files` (`file_id`, `file_name`) VALUES
(100, 'testfile_1'),
(200, 'testfile_2'),
(300, 'testfile_3'),
(400, 'testfile_4');
CREATE TABLE IF NOT EXISTS `tag_parents` (
`parent_id` int(11) NOT NULL DEFAULT '0',
`parent_name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tag_parents` (`parent_id`, `parent_name`) VALUES
(1, 'Language'),
(2, 'Measure'),
(3, 'Category');
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(11) NOT NULL DEFAULT '0',
`file_id` int(11) DEFAULT NULL,
`tag_parent_id` int(11) DEFAULT NULL,
`tag_value` varchar(64) DEFAULT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tags` (`tag_id`, `file_id`, `tag_parent_id`, `tag_value`) VALUES
(1, 100, 1, 'ENG'),
(2, 200, 1, 'ENG'),
(3, 300, 1, 'FRA'),
(4, 400, 1, 'DEU'),
(5, 100, 2, 'METRIC'),
(6, 200, 2, 'IMPERIAL'),
(7, 300, 2, 'METRIC'),
(8, 400, 2, 'IMPERIAL'),
(9, 100, 3, 'Type1'),
(10, 200, 3, 'Type3'),
(11, 300, 3, 'Type1'),
(12, 400, 3, 'Type1');
Any help is appreciated. Thank you! ( I have failed with all my trials so far or they have been far to slow (with subselects)).
Upvotes: 0
Views: 206
Reputation: 2796
It makes me think of meta-models where the properties of an object (file in this case) are not columns but values to be looked up in the tags. It will always be slower than having the columns directly in the table, but you should be able to get it to work reasonably. I see tag_parents
as tag_type
. The following (fiddle here) should work:
select f.*
from files f
where exists ( -- it should have the "Category"."Type1"
select parent_id
from tag_parents categoryT, tags category
where categoryT.parent_name="Category"
and category.tag_parent_id=categoryT.parent_id
and category.tag_value="Type1"
and category.file_id=f.file_id
)
and exists ( -- as well as "Language"."ENG"
select parent_id
from tag_parents languageT, tags language
where languageT.parent_name="Language"
and language.tag_parent_id=languageT.parent_id
and language.tag_value="ENG"
and language.file_id=f.file_id
)
and exists ( -- as well as "Measure"."METRIC"
select parent_id
from tag_parents measureT, tags measure
where measureT.parent_name="Measure"
and measure.tag_parent_id=measureT.parent_id
and measure.tag_value="METRIC"
and measure.file_id=f.file_id
)
You could simplify your life by defining some views such as Category
, Language
and Measure
(and whatever other tag_parents
you'll have). This will make queries more readable. With three views category, measure and language you could be write:
select * from files f, category c, measure m, `language` l
where f.file_id=c.file_id and c.value="Type1"
and f.file_id=l.file_id and l.value="ENG"
and m.file_id=l.file_id and m.value="METRIC";
Or even better, if you have a fixed number of these properties, you could define a view similar to the query at the top, but using outer joins rather than exists, with nullable columns category, measure and language:
create view filesView (file_id, category, measure, `language`) as
select f.file_id, c.tag_value, m.tag_value, l.tag_value
from files f
left outer join (tags c, tag_parents ct) on c.file_id=f.file_id
and c.tag_parent_id=ct.parent_id
and ct.parent_name="Category"
left outer join (tags l, tag_parents lt) on l.file_id=f.file_id
and l.tag_parent_id=lt.parent_id
and lt.parent_name="Language"
left outer join (tags m, tag_parents mt) on m.file_id=f.file_id
and m.tag_parent_id=mt.parent_id
and mt.parent_name="Measure";
And then you can write:
select file_id, category, measure, `language`
from filesView
where category="Type1"
and `language`="ENG"
and measure="METRIC";
(Sorry got carried away a bit.)
Upvotes: 3