user34537
user34537

Reputation:

Explain db tag storage method

from this post What is the most efficient way to store tags in a database?

It was recommended to store tag tables like this.

Table: Item
Columns: ItemID, Title, Content

Table: Tag
Columns: TagID, Title

Table: ItemTag
Columns: ItemID, TagID

And another SO post said the same thing. Can anyone explain why tags should be stored like this? I am guessing ItemID is some internal val, title is the tag name (c++, sql, noob, etc) content is whatever else data i want to store with my item. why not something like

Table: Item
Columns: ItemID, Title, <more data i want>

Table: TagList
Columns: ItemID, Title

title in item being "item name" and tag title being "c++" "sql" "noob" "etc"

Upvotes: 1

Views: 1621

Answers (2)

Bill Karwin
Bill Karwin

Reputation: 562310

There's nothing wrong with the second design you show, the one with the TagList table, except that it takes more space.

That is, if you tag 10,000 items with the tag "database-design", then in the two-table design, you have to store that string 10,000 times. If space-efficiency is more important, you could use the three-table design, which would only store the 4-byte integer ID for "database-design" 10,000 times. A savings of 10 * 10,000 bytes.

Another difference is that in the three-table design, you could have more than one row in the Tag table with the same string, even though they have different integer ID values. So in the ItemTag table, they would appear to be different tags, and you'd never know that they're actually tagged similarly. Whereas in the two-table design, tags with the same spelling become grouped together implicitly.

Another point: if you have the need to change the spelling of tags, then in the two-table design you have to update many rows. In the three-table design, you only need to update a single row.

And finally, if you commonly need a list of unique tags, it's more speedy to query the Tags table in the three-table design, instead of needing a SELECT DISTINCT tag FROM TagList every time you need the unique list. And the latter only gives you a list of tags in use, not a list of all eligible tags.

Upvotes: 6

Cade Roux
Cade Roux

Reputation: 89661

Why? It's normalized. ItemID will be the primary key (perhaps a surrogate or identity), TagID will almost certainly be a surrogate/identity and for constraints/performance you'll have a unique constraint and/or index (perhaps even clustered on the tag.title).

In your example, TagList will not have a primary key (unless perhaps both columns will be the primary key), and it cannot be normalized to remove tag title redundancy (two items with the same tag will have separate entries). In addition, because tags aren't entities in their own right in the second model, they can't be linked to a different type of entity with another ItemTag table.

Upvotes: 1

Related Questions