Reputation: 8961
what is the difference between a relation of a relational database and a dimension as represented in a star diagram?
As part of an assignment I have a relational datawarehouse design, where most of the tables have been normalised using many to many, one to one, one to many relationship schema (I think this is the right terminology? please correct me if I'm wrong). The next step is to draw a star diagram that could be used in a data-mining environment, which I guess means a fact table that draws from different dimensions...
I'm a little confused here because 1. any data-analysis I could think of could be taken from the relational database, so whats the point of re-structuring it? and 2. If some of the tables that you want to draw data from contain foreign keys, how do you split that into dimensions.
for example: I have these relations:
Courses {course_id, description}
Modules {module_id, description}
Course_modules {course_id, module_id}
Students {student_id, address, enrollment_option, enrollment_date, name, surname, nationality, home_language, gender ...}
Module_grades {student_id, module_id, assignment_1, assignment_1_sub_date, assignment_2, assignment_2_sub_date, exam, exam_date, overall_result}
and I'd like to know how course results relate to module grades. With a relational database I would query to join a table containing students information with the module grades table. What would be the equivalent with dimensions and reports? Especially as I'm using multiple columns as my primary key in the grades relation..
Upvotes: 0
Views: 127
Reputation: 48246
An operational database is highly normalized, which improves write performance, and minimizes write anomalies. It is designed to facilitate transaction processing.
An analytic database (data warehouse) is highly denormalized, which improves read performance, and makes it easier for non DBAs to understand. It is designed to facilitate analysis.
what is the difference between a relation of a relational database and a dimension
A data warehouse can be in a relational database, and can use its relations (tables), so there is no difference.
any data-analysis I could think of could be taken from the relational database, so whats the point of re-structuring it?
A data warehouse often includes data from many sources, not just your operational database. Examples: emails, website scraping.
If you tell your boss to join ten tables to do a simple analysis, you will get fired.
If some of the tables that you want to draw data from contain foreign keys, how do you split that into dimensions.
This depends entirely on what you are trying to analyze, but in general you denormalize and copy the data to dimension tables.
You need to start with a process or event that you want to analyze.
Use Excel. Add all the columns that are pertinent to your analysis. For example, if you were analyzing the process of people visiting your website, each row in Excel would represent a site visit, and columns might be start_time, # pages visited, first page, last page, etc.
Now do ONE level of normalization. Find categorical columns that you can group together (like info about the user's web browser). These would go in a browsers dimension table. Find (true) numerical values that you cannot normalize out. These are measures. Example, the number of pages visited.
The measures, and keys that refer to your dimension tables, are your fact table.
Now go read this book.
Upvotes: 1