ProFishChris
ProFishChris

Reputation: 55

Avoiding duplicate entries in an mySQL table with non unique columns

I am working on a CMS system (largely as a learning exercise) for a private website. Atm I have three tables: one for articles, one for tags and a joining table so that each article can have multiple tags.

The table I am having issues with consists of three columns -

article_tags: id (auto_increment), article_id, tag_id

My problem stems from the fact that an article can appear any number of times, and a tag can also appear any number of times, however a given combination of the two should only appear once - that is, each article should only have one reference to any single tag. Currently it is possible to INSERT "duplicate" rows where the id is different, but the combination of article_id and tag_id are the same:

id , article_id, tag_id
1       1           1
2       1           2    
3       2           1    
4       1           1    <- this is wrong

I could check in PHP code for a record that contains this combination, but I'd prefer to do it in sql if possible (if it is not, or it is undesirable then I will do it using PHP). Due to the id being different and the inability to set unique columns things like INSERT IGNORE and ON DUPLICATE do not work.

I'm quite new to mySQL so if I'm doing something silly please point me in the right direction.

Thanks

Upvotes: 0

Views: 2925

Answers (2)

O. Jones
O. Jones

Reputation: 108841

Such many-to-many relationship tables, sometimes called join tables, often have just two columns, and have a primary key that's a composite of the two.

  article_id
  tag_id
  pk = (article_id, tag_id)

If you change the definition of that table you will definitively solve that problem.

How should you order the columns in composite keys? It depends on how your application will look up items in the join table. If you'll always start with the article_id and look up the tag_id, then you put the article_id first in the key. The DBMS can random-access values for the first column in the key, but has to scan the index to find values in second (or subsequent) columns in the key.

You may want to create a second index on the table, (tag_id, article_id). This will allow fast lookups based on the tag_id. You may ask, "why bother to put both columns in the index?" That's to make the index into a covering index. In a covering index, the desired value can be retrieved directly from the index. For example, with a covering index,

 SELECT article_id FROM article_tag WHERE tag_id = 12345

(or a JOIN that uses similar lookup logic) only needs to access the index on the disk drive to get the result. If you don't have a covering index, the query needs to jump from the index to the data table, which is an extra step.

Join tables typically have very short rows (a couple of integers) so the duplicated data for a couple of covering indexes (the primary key and the extra one) isn't a big disk-space hog.

Upvotes: 3

Justin Iurman
Justin Iurman

Reputation: 19026

You should review your table definition.

You can (from best to worst):

  1. Add a composite primary key on (article_id and tag_id) and remove auto_increment (previous primary key)
  2. Add an index (UNIQUE) on (article_id and tag_id) and keep your auto_increment primary key
  3. Select distinct in php: SELECT DISTINCT(article_id, tag_id) FROM ... without changing anything in your table

Right now, your table is defined as something like this:

CREATE TABLE IF NOT EXISTS `article_tags` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `article_id` int(11) NOT NULL,
  `tag_id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The best solution (option 1) would be to remove your current (auto_increment) primary key and add a primary key (composite) on columns article_id and tag_id:

CREATE TABLE IF NOT EXISTS `article_tags` (
  `article_id` int(11) NOT NULL,
  `tag_id` int(11) NOT NULL,
  PRIMARY KEY (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

But (option 2) if you absolutely want to keep your auto_increment primary key, add an index (unique) on your columns:

CREATE TABLE IF NOT EXISTS `article_tags` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `article_id` int(11) NOT NULL,
  `tag_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `article_id` (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Anyway, if you don't want to change your table definition, you could always use DISTINCT in your php query:

SELECT DISTINCT(article_id, tag_id) FROM article_tags

Upvotes: 4

Related Questions