Reputation: 25
I am currently implementing a data warehouse with SQL Server 2014 and I am struggling with the layout of the dimension tables, well, more the extraction of the data. I have an Excel-File with the following format:
+--------+---------+------------+------------+------------+---+-------------+
| Region | Country | Location | Question 1 | Question 2 | … | Question 97 |
+--------+---------+------------+------------+------------+---+-------------+
| Africa | Algeria | Location 1 | Yes | Yes | … | No |
| Africa | Algeria | Location 2 | No | Yes | … | Yes |
+--------+---------+------------+------------+------------+---+-------------+
There are some more attributes about who answered the question when. As indicated with the dots, there are more than 3 Questions, actually there are 97 of them. Currently I've imported the excel file into my raw table which represents my preparation area(It has the exact same format as the excel file with first row as headers). Now, when I want to put the data into staging I'm struggling a bit with the layout. So I thought about the layout and since my boss told me that the questions could change in future, I came up with the following two tables:
+------------+---------------------+
| DIM_Questionnaire |
+------------+---------------------+
| PK | SUR_QuestionnaireID |
| FK | LocationID |
| FK | TimeKey |
+------------+---------------------+
+--------------+-----------------+
| DIM_Question |
+--------------+-----------------+
| PK | SUR_QuestionID |
| FK | QuestionnaireID |
| | Question |
| | Answer |
| | Category |
+--------------+-----------------+
The Survey is then added to a fact table. What do you think about this solution in a warehouse? Furthermore, how would i be able to get the column name from the raw-table as a value in in the Question table?
Upvotes: 0
Views: 210
Reputation: 1405
It's not entirely clear what your terminology means. It seems that there are multiple Surveys (collections of returned questionnaires, each of which has the same set of question). But then, surely your Survey dimension should have a FK to a QuestionnaireID?
The structure of your Dim Question isn't consistent with this. According to the FK to QuestionnaireID, the Questionnaire seems to be the "class" (set of questions sent out to multiple people). So what's the "instance of the class" (specific copy of the set of questions, returned by one person with answers)?
I think your fact should be Answer, with dimensions Question, (possibly) Survey, Questionnaire (could be an attribute of dimension Question). But I can't be entirely sure about this (except that Answer is definitely the fact) because I'm not sure how you're using the terms.
Upvotes: 0
Reputation: 104
I think you have worked out a good solution. As you have mentioned, the very first layout is not quite good since many times the number of questions per survey may change and if you have 97 question columns for example, if you get a requirement to hold 100 questions, you have to change the table structure. and if the questions are about 20, you waste lot of columns unnecessarily. Good work on figuring out the structure.
Your structure looks fine to me. Just one thing I want to clarify from you? the answer in the DIM_Question table is the correct answer right? The answers that users provide should go to another answer table which holds the DIM_Question SK so that any time you can link the answer with the exact question on the question dimension.
And one more thing, why did you add the survey table as a fact table? i think your measures are "answers" right? for example, if someone asks "How many of them answered correctly to this question?", "what type of answers that people have provided for this question?" etc etc. I think you should double check on that. If you are stuck, just provide some hits on the actual requirement, so we can figure it out.
and finally for your next question "how would i be able to get the column name from the raw-table as a value in in the Question table? " I think you can get the table column names from below query
SELECT *
FROM <database_name>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Dim_Table'
Hope this helps.
Upvotes: 1