SQL - Survey Data, Table Schema Design for looped survey questions

Question

Suppose we have a survey where some of the questions are asked across multiple entities.

For example:
Car Brands = [Brand 1, Brand 2, Brand 3, Brand 4...]

This questions will be asked for each one of the car brands (looped).
Question Q01 = (Scale 1-10) Do you think [Car Brand] cars are reliable?
Question Q02 = (Scale 1-10) Do you think [Car Brand] cars are a good value?
...

I'm designing a schema that will power some web based analytic tools, so query performance is important.

The schema will be 3 tables: Records, Questions, Answers

I have two approaches for the answers table:

A) Table: Answers

QuestionId | AnswerValue | BrandOption 
   Q01     |      7      |      1
   Q01     |      5      |      2
   Q01     |      4      |      3
   Q01     |      8      |      4

B) Table: Answers

QuestionId | AnswerValue
  Q01-1    |     7
  Q01-2    |     5
  Q01-3    |     4
  Q01-4    |     8

The queries can be either for one brand at a time or for all the brands, with equal priority for both queries.

Option A seems to give me some advantages if I ever need to do something like a group by, however if most of the queries are for a specific brand, then Option B seems to be more efficient.

Thoughts?

Zohar Peled · Accepted Answer

Option A is better, even if you don't see it right now.
Storing multiple values in a single database "cell" is a mistake any way you look at it (though unfortunately, a very common mistake) - not to mention it's a violation of the first normal form - which specifically states that each column can only contain a single atomic value in each row (though the original rule is using a different terminology).

The disadvantages are numerous and some of them are critical, including (but not limited to):

You loose the ability to use the proper data type - two ints stored together must be stored as a different data type than int.
You might loose the ability to verify your data is, in fact, correct, or that the different parts can be converted to the correct data type (most databases supports check constraints nowadays but not all (Yes, MySql, I'm pointing my finger at you!))
You loose the ability to enforce uniqueness on each parts of the data separately.
You can't use the different parts of the data as basis for foreign key constraints

The list goes on and on - but I think anyone should get the picture by now - a database column should be used to store a single value for each row - every time.

SQL - Survey Data, Table Schema Design for looped survey questions

Answers (2)

Related Questions