sonam
sonam

Reputation: 3770

which database structure to choose for tagging system

I have a database structure as follow:

 tbl_products
 -pk id (AutoIncrement)
 -name
 -description

 tbl_tags (1)   OR   tbl_tags (2)
 -pk name            -pk id (AutoIncrement)
                     -name (unique)

 tbl_products_tags
 -fk product_id
 -fk tag_id

I have seen most choose data structures tbl_tags (2). I want to ask whether i could choose tbl_tags(1) since name is always unique, so i want to to make it primary. Does it have any downside ?

Upvotes: 0

Views: 81

Answers (2)

Chris Saxon
Chris Saxon

Reputation: 9875

If you make the tag name unique, you have to think about what you'll do if a name needs to be changed. For example, if I want to change "tag" to "tags".

If this is a primary key, then all the child records that refer to "tag" will also have to be updated so the constraint is valid. If you have a lot of rows referring to a given name, running this change is likely to be slow and introduce some blocking/contention into your application. Whereas if you use a surrogate primary key, you only have to update the unique name field, not all the child rows as well.

If you're certain that you'll never update a tag name then you could use it as the primary key. Beware of changing requirements however!

Natural keys generally make sense when using codes that are issued and managed by an external source (e.g. airport, currency and country codes). In these cases you can be sure that the natural key won't change and is guaranteed to be unique within the domain.

Upvotes: 3

XIVSolutions
XIVSolutions

Reputation: 4502

My understanding is there would be a marginal performance penalty to tbl_tags (1) in the context of a very large dataset when compared to option 2. In smaller datasets, probably not so much. The machine can process integers much more efficiently than strings.

In the bigger picture though, with modern processor speeds, the difference between the two might be negligable in all but the largest datasets.

Of course, I am speaking about relational databases here. The various flavors of NoSQL are a different animal.

Also, there is the matter of consistency. The other tables in your database all seem to be using (what I assume to be) an auto-incrementing integer ID. For that reason, I would use it on the tags table as well.

The use of auto-incrementing integer PK fields vs "Natural Keys" in designing a database is a long-standing debate. My understanding is academics largely prefer the "Natural Keys" concept, while in practice some form of generated unique key tends to be the norm.

Personally, I prefer to create generated keys which have no meaning to the end user, integers where possible. Unless I have missed something, index performance is significantly enhanced.

Upvotes: 1

Related Questions