leeeroy
leeeroy

Reputation: 11446

Associating tags with entities, store the tag once or not?

Consider I have a table with notes which could be associated with zero or more tags, how would I decide to do

create table notes (
  id int , -- primary key
  -- other fields
);
create table tagmapping (
 noteid int, -- refers notes.id
 tagid int, -- refers tags.id
 );
create table tags (
  int id, -- primary key
  tagname varchar(255)
);

vs storing the same tag potentially several times as in

create table notes (
  int id, -- primary key
  -- other fields

);

create table bar (
  id id, -- primary key
  tag varchar(255),
  -- other fields
  noteid int -- refers to notes.id, (not unique)
);

What mess/advantages would I get myself into going for the last approach ?

Upvotes: 1

Views: 121

Answers (3)

Thomas
Thomas

Reputation: 64635

The disadvantages of the second solution will in part rear themselves when you need to alter a tag or add attributes to tags. Suppose you have a tag named "Foo" and you now wish to add an attribute that would allow you to determine whether the tag is active or not. There is no means to do that in the second approach. You would have to enforce "magic tag names" in the presentation tier. In addition, you have no means of enforcing consistent naming on the tags. Someone could add a tag called "Foo" and another call "Fu" even though they are supposed to be the same tag.

Upvotes: 0

Levi Hackwith
Levi Hackwith

Reputation: 9332

For the second approach:

  1. It's be harder to query against (for reports and such) since the tags can be potentially misspelled and other things like that
  2. Speed. It's faster to query against an int than it is a string
  3. What if you need to rename a tag? You'll have to rename it in each and every note either manually or via a series of queries

  4. Storing duplicate data is generally considered a bad practice among DBA's and developer's. It's best always follow the normalization rules when you can

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838006

The first way is the standard way of implementing a many-to-many relationship. You should do this because this is what will surprise people the least.

A couple of issues with the second method:

  • You will have to duplicate the tag names, which introduces redundancy. It could be a waste of disk space if the tag names are long.
  • If you ever want to rename a tag, you have to update the entire table instead of just one field.

Upvotes: 0

Related Questions