user1709091
user1709091

Reputation:

SQL/SSIS DataWareHouse Fact table loading, best practices?

I am building my first datawarehouse in SQL 2008/SSIS and I am looking for some best practices around loading the fact tables.

Currently in my DW I have about 20 Dimensions (Offices, Employees, Products, Customer, etc.) that are of Type 1 SCD. In my dw structure, there are a few things I have already applied:

In my Fact loading SSIS project, the current method I have for loading dimensions is having multiple lookups (20+) to each of the DIMs, then populating the FACT table with the data.

For my lookups I set:

Is this the best approach? Pictures attached to help with my description above.

enter image description here enter image description here enter image description here

Upvotes: 8

Views: 10793

Answers (1)

Bill Anton
Bill Anton

Reputation: 2970

Looks fine. There are options if you start to run into performance issues, but if this is stable (finishes within data-loading time window, source systems aren't being drained of resources, etc), then I see no reason to change.

Some potential issues to keep an eye on...

  1. having 20+ full-cache lookup-transforms may pose a problem if your dimensions increase in size...due to memory constraints on the SSIS system...but since they are type 1, I wouldn't worry.
  2. full-cache lookups "hydrate" pre-execution...having 20+ of them may slow you down

A common alternative (to what you have above) is to extract the fact table data from the source system and land it in a staging area before doing the dimension key lookups via a single SQL statement. Some even keep a set of dimension key mapping tables in the staging area specifically for this purpose. This reduces locking/blocking on the source system...if you have a lot of data each load, and have to block the source system while you suck the data out and run it through those 20+ lookup transforms.

Having a good staging area strategy becomes more important when you have a large amount of data, large dimensions, complex key mappings (usually due to multiple source systems), and short data-loading time windows.

Upvotes: 5

Related Questions