codeomnitrix
codeomnitrix

Reputation: 4249

problem related to atomicity in database table

I am creating a forum page for which i have created following database schema:

Forum(questionId, postedByUserId, questionSubject, questionBody, TagIds);

Tags(tagId, tagName);

Entries in Forum will be something like:

(1, 1, 'sample subject', 'sample body', '1 4 2') ...

And sample entries of Tags will be:

(1, 'C'), (2, 'C++'), (3, 'Java'), (4, 'Data Structure') ...

Now the problem is that first normal form says that all the fields should be atomic which is not satisfied in this case but i think space is saved as if i were creating a new table of forum_tag(questionId, tagId); then i think this will take more space on database, but would be correct conceptually.

So i don't know what should i do whether to do what i am doing right now or to make the coloumns atomic as per the normalization.

Please explain which is better and why because there are many cases when i found such problem but all the time i remain ambiguous that what should i do!

So please help.

Thanks in advance :)

Upvotes: 1

Views: 415

Answers (3)

Brett Rossier
Brett Rossier

Reputation: 3482

I would go with making your fields atomic. Most times you have a field that jumbles values up into one field, you end up with headaches later when you have to constantly pry that data apart for reporting or analysis. What if you wanted to do something as simple as getting a count of your tags? Because of non-atomic data you wouldn't even be able to do a quick SELECT COUNT(). You'd also have big issues creating queries that cross reference forum posts with different tags. Say you wanted a query of all forum posts tagged with "programming"?

Making the data atomic up front makes it much easier to work with down the road, when you're trying to query or analyze it. Put it this way, data starts out generalized before it enters your DB, but you're always going to want specifics from it. Try to keep the data in discrete chunks so that it's easier to get at the specifics.

Upvotes: 1

Rik
Rik

Reputation: 29243

You should make a third table representing the relationship between Forum and Tags:

ForumTags(ftID, Forum, Tag)

This way, your database is properly normalized, so adding and removing tags to forums becomes much easier. Don't worry about the extra space it might take in the database, like Walter Mitty says: Space is cheap, retrieval much less so. As a general rule: Normalization is always a good idea, unless explicitly proven otherwise

Upvotes: 0

Walter Mitty
Walter Mitty

Reputation: 18940

Space is cheap in a database. Retrieval time, which varies with space, is much less cheap. however, retrieval time can also be affected by whether a keyed access strategy will work, and will be chosen by the query optimizer. The effect can be dramatic.

Consider the following retrieval oon the schema you proposed: find all the forum entries where one of the related tags is "4". For most DBMSes, this query will require a sequntial scan through the entire Forums table. Depending on data volume, this could be millions of disk I/O s.

Now consider a junction table

ForumTags (ForumId, TagId) primary key (ForumId, TagId)

Further, let's say that there's an index on TagId in addition to the automatic index on (ForumId, TagId)

The same query would result in a index lookup of value "4" in one of the indexes, and require as few as a dozen disk I/Os.

One of the goals of normalization is keyed access to all data. first normal form is pursuant to that goal.

I've had real life situations where a first normal form or better schema can be compared to a schema with embedded lists. The speed difference in those cases was something like 50 to 1.

Upvotes: 1

Related Questions