Reputation: 68648
I seem to often find myself wanting to store data of more than one type (usually specifically integers and text) in the same column in a MySQL database. I know this is horrible, but the reason it happens is when I'm storing responses that people have made to questions in a questionnaire. Some questions need an integer response, some need a text response and some might be an item selected from a list.
The approaches I've taken in the past have been:
Store everything as text and convert to int (or whatever) when needed later.
Have two columns - one for text and one for int. Then you just fill one in per row per response, and leave the other one as null.
Have two tables - one for text responses and one for integer responses.
I don't really like any of those, though, and I have a feeling there must be a much better way to deal with this kind of situation.
To make it more concrete, here's an example of the tables I might have:
CREATE TABLE question (
id int(11) NOT NULL auto_increment,
text VARCHAR(200) NOT NULL default '',
PRIMARY KEY ('id')
)
CREATE TABLE response (
id int(11) NOT NULL auto_increment,
question int (11) NOT NULL,
user int (11) NOT NULL,
response VARCHAR(200) NOT NULL default ''
)
or, if I went with using option 2 above:
CREATE TABLE response (
id int(11) NOT NULL auto_increment,
question int (11) NOT NULL,
user int (11) NOT NULL,
text_response VARCHAR(200),
numeric_response int(11)
)
and if I used option 3 there'd be a responseInteger table and a responseText table.
Is any of those the right approach, or am I missing an obvious alternative?
Upvotes: 4
Views: 709
Reputation:
[Option 2 is] NOT the most normalized option [as @Ray claims]. The most normalized would have no nullable fields and obviously option 2 would require a null on every row.
At this point in your design you have to think about the usage, the queries you'll do, the reports you'll write. Will you want to do math on all of the numeric responses at the same time? i.e. WHERE numeric_response IS NOT NULL? Probably unlikely.
More likely would be, What's the average response WHERE Question = 11. In those cases you can either choose the INT table or the INT column and neither would be easier to do than the other.
If you did do two tables, you'd more than likely be constantly unioning them together for questions like, what % of questions have a response etc.
Can you see how the questions you ask your database to answer start to drive the design?
Upvotes: 4
Reputation: 754860
I'd opt for Option 1. The answers are always text strings, but sometimes the text string happens to be the representation of an integer. What is less easy is to determine what constraints, if any, should be placed on the answer to a given question. If some answer should only be a sequence of one or more digits, how do you validate that? Most likely, the Questions table should contain information about the possible answers, and that should guide the validation.
I note that the combination of QuestionID and UserID is (or should be) unique (for a given questionnaire). So, you really don't need the auto-increment column in the answer. You should also have a unique constraint (or primary key constraint) on the QuestionID and UserID anyway (regardless of whether you keep the auto-increment column).
Upvotes: 2