Reputation:
from this post What is the most efficient way to store tags in a database?
It was recommended to store tag tables like this.
Table: Item
Columns: ItemID, Title, Content
Table: Tag
Columns: TagID, Title
Table: ItemTag
Columns: ItemID, TagID
And another SO post said the same thing. Can anyone explain why tags should be stored like this? I am guessing ItemID is some internal val, title is the tag name (c++, sql, noob, etc) content is whatever else data i want to store with my item. why not something like
Table: Item
Columns: ItemID, Title, <more data i want>
Table: TagList
Columns: ItemID, Title
title in item being "item name" and tag title being "c++" "sql" "noob" "etc"
Upvotes: 1
Views: 1621
Reputation: 562310
There's nothing wrong with the second design you show, the one with the TagList
table, except that it takes more space.
That is, if you tag 10,000 items with the tag "database-design", then in the two-table design, you have to store that string 10,000 times. If space-efficiency is more important, you could use the three-table design, which would only store the 4-byte integer ID for "database-design" 10,000 times. A savings of 10 * 10,000 bytes.
Another difference is that in the three-table design, you could have more than one row in the Tag
table with the same string, even though they have different integer ID values. So in the ItemTag
table, they would appear to be different tags, and you'd never know that they're actually tagged similarly. Whereas in the two-table design, tags with the same spelling become grouped together implicitly.
Another point: if you have the need to change the spelling of tags, then in the two-table design you have to update many rows. In the three-table design, you only need to update a single row.
And finally, if you commonly need a list of unique tags, it's more speedy to query the Tags
table in the three-table design, instead of needing a SELECT DISTINCT tag FROM TagList
every time you need the unique list. And the latter only gives you a list of tags in use, not a list of all eligible tags.
Upvotes: 6
Reputation: 89661
Why? It's normalized. ItemID will be the primary key (perhaps a surrogate or identity), TagID will almost certainly be a surrogate/identity and for constraints/performance you'll have a unique constraint and/or index (perhaps even clustered on the tag.title).
In your example, TagList will not have a primary key (unless perhaps both columns will be the primary key), and it cannot be normalized to remove tag title redundancy (two items with the same tag will have separate entries). In addition, because tags aren't entities in their own right in the second model, they can't be linked to a different type of entity with another ItemTag table.
Upvotes: 1