Reputation: 55
I am working on a CMS system (largely as a learning exercise) for a private website. Atm I have three tables: one for articles, one for tags and a joining table so that each article can have multiple tags.
The table I am having issues with consists of three columns -
article_tags: id (auto_increment), article_id, tag_id
My problem stems from the fact that an article can appear any number of times, and a tag can also appear any number of times, however a given combination of the two should only appear once - that is, each article should only have one reference to any single tag. Currently it is possible to INSERT "duplicate" rows where the id is different, but the combination of article_id and tag_id are the same:
id , article_id, tag_id
1 1 1
2 1 2
3 2 1
4 1 1 <- this is wrong
I could check in PHP code for a record that contains this combination, but I'd prefer to do it in sql if possible (if it is not, or it is undesirable then I will do it using PHP). Due to the id being different and the inability to set unique columns things like INSERT IGNORE and ON DUPLICATE do not work.
I'm quite new to mySQL so if I'm doing something silly please point me in the right direction.
Thanks
Upvotes: 0
Views: 2925
Reputation: 108841
Such many-to-many relationship tables, sometimes called join tables, often have just two columns, and have a primary key that's a composite of the two.
article_id
tag_id
pk = (article_id, tag_id)
If you change the definition of that table you will definitively solve that problem.
How should you order the columns in composite keys? It depends on how your application will look up items in the join table. If you'll always start with the article_id and look up the tag_id, then you put the article_id first in the key. The DBMS can random-access values for the first column in the key, but has to scan the index to find values in second (or subsequent) columns in the key.
You may want to create a second index on the table, (tag_id, article_id)
. This will allow fast lookups based on the tag_id. You may ask, "why bother to put both columns in the index?" That's to make the index into a covering index. In a covering index, the desired value can be retrieved directly from the index. For example, with a covering index,
SELECT article_id FROM article_tag WHERE tag_id = 12345
(or a JOIN that uses similar lookup logic) only needs to access the index on the disk drive to get the result. If you don't have a covering index, the query needs to jump from the index to the data table, which is an extra step.
Join tables typically have very short rows (a couple of integers) so the duplicated data for a couple of covering indexes (the primary key and the extra one) isn't a big disk-space hog.
Upvotes: 3
Reputation: 19026
You should review your table definition.
You can (from best to worst):
SELECT DISTINCT(article_id, tag_id) FROM
...
without changing anything in your tableRight now, your table is defined as something like this:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The best solution (option 1) would be to remove your current (auto_increment) primary key and add a primary key (composite) on columns article_id and tag_id:
CREATE TABLE IF NOT EXISTS `article_tags` (
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
But (option 2) if you absolutely want to keep your auto_increment primary key, add an index (unique) on your columns:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `article_id` (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Anyway, if you don't want to change your table definition, you could always use DISTINCT in your php query:
SELECT DISTINCT(article_id, tag_id) FROM article_tags
Upvotes: 4