Reputation:
I am building my first datawarehouse in SQL 2008/SSIS and I am looking for some best practices around loading the fact tables.
Currently in my DW I have about 20 Dimensions (Offices, Employees, Products, Customer, etc.) that are of Type 1 SCD. In my dw structure, there are a few things I have already applied:
In my Fact loading SSIS project, the current method I have for loading dimensions is having multiple lookups (20+) to each of the DIMs, then populating the FACT table with the data.
For my lookups I set:
Is this the best approach? Pictures attached to help with my description above.
Upvotes: 8
Views: 10793
Reputation: 2970
Looks fine. There are options if you start to run into performance issues, but if this is stable (finishes within data-loading time window, source systems aren't being drained of resources, etc), then I see no reason to change.
Some potential issues to keep an eye on...
A common alternative (to what you have above) is to extract the fact table data from the source system and land it in a staging area before doing the dimension key lookups via a single SQL statement. Some even keep a set of dimension key mapping tables in the staging area specifically for this purpose. This reduces locking/blocking on the source system...if you have a lot of data each load, and have to block the source system while you suck the data out and run it through those 20+ lookup transforms.
Having a good staging area strategy becomes more important when you have a large amount of data, large dimensions, complex key mappings (usually due to multiple source systems), and short data-loading time windows.
Upvotes: 5