Reputation: 2301
Hope so, this question fits in the Stack Overflow standard.
I am clear with the data warehouse concepts. But I am confused with practically building data warehouse as I just work on reporting tools.
Let's say I have a database in my Oracle with, consider 20 tables. So as far as I know, the step involved in DW building building is " Do the ETL "
But I am confused with ,
1] Defining dimensions & facts & the relationships between them in a table structure manner.
2] How do I choose or implement the schema that I want, say Fact Constellation ?
3] How should I determine this is dimension, this is fact ?
4] How should I make use of surrogate keys ?
In simple words, I want to build a Data Warehouse in my Database from existing data. How should I ?
Your answers, ideas, reference materials are welcome.
Upvotes: 3
Views: 2788
Reputation: 1357
First, do what Ian said.
There a multitude of different approached to data warehouse. Here is one persons view, it is not necessarily the best or correct one but I have have a few years experience working in what I imaging is a similar situation to yours (someone has said 'hey, lets build a data warehouse for reporting and stuff)
Here goes
1] Dimension Tables - look up tables with information that may change over time Fact Tables - Point in time values/state (Not usually dimensioned). Will usually contain a heap of aggregate-able data fields and links back to dimension tables
Products, Persons, Organisations are candidates for Dimention Tables. Sales, Org Chart Relations, Employee Counts are candidates for Fact Tables
Fact will contains FK links back to Dimension tables. Dimension tables should not have any direct FK references to other tables (Use a bridging table if you have to, but if you try and rebuild your source system as a dimensioned copy of itself you are going to have a bad time..)
2] Start with what you will be reporting and work back from there. Usually the answer is 'we need to report on everything' but eventually you will identify. (I'm not really sure what you are asking here)
3] High level: Consider a cross tab chart in excel. If the field a column or row header its probably a dimension. If its a number inside the chart, its probably a fact.
4] Always use your own surrogate keys in your warehouse. (Surrogate keys enable you to have a unique primary key in your dimensioned tables) Keep them secret from everyone, they only apply to your warehouse/data mart. [opinion]Even give your FACT tables their own surrogate keys even if they lend themselves to a composite PK[/opinion].
Again, I strongly urge you to spend more time investigation DW concepts. Come up with a strong internal concept of what your data warehouses goals are and how/if you are going to separate your warehouse from your data marts.
Consider spending the effort fixing your source systems so that they can produce the outputs you are after.
Upvotes: 2