John W. Mnisi
John W. Mnisi

Reputation: 907

Look up surrogate keys and Populate fact table

I have 8 surrogate keys that I need to lookup on the dimension tables and populate a fact table. I created 8 lookups to the dimension tables, and then use the merge join and sort components to merge the data into one result set for insertion into dimension table. My worry is that there's a lot of merging and sorting that takes place before I do an insert, and the performance is not good. What is the best way for achieving what I explained above?

Upvotes: 0

Views: 7264

Answers (2)

billinkc
billinkc

Reputation: 61221

You are well advised to be concerned about the sorting and merging. Instead, why not use the actual Lookup Component to translate your fact values into actual keys. Basically, take the queries you already have for your lookup merge sources and plop those into the Lookup Component. Only include the columns you need for either the match or to return the surrogate key.

In the unreadable image below, you can see a package that loads our daily headcount fact table. There are 10 total lookups in the worse scenario but you can observe that there are no sort or merge operations being performed. 41M rows hitting lookups of anywhere from 2 to just shy of 1M rows in it. The lookups will add additional columns to the existing row, which I believe is the same functionality you have expressed a desire to have without the negative performance implications of the sorts and merge joins.

enter image description here

Upvotes: 0

Nick.Mc
Nick.Mc

Reputation: 19194

The other alternative is to take a ELT (Extract Load Transform) approach - load your data into staging tables, run a single UPDATE statement to apply the surrogate keys in batch then load into the fact table. This will proabbly be faster and you have less bits of SSIS metadata to click and mess about with in the UI if any columns change.

i.e. a trivial example with three dimensions, including unknown (-1)

  1. Load data into staging via SSIS
  2. Run this via SSIS

    UPDATE StagingTable SET SK1 = ISNULL(Dimension1.SK,-1), SK2 = ISNULL(Dimension2.SK,-1), SK3 = ISNULL(Dimension3.SK,-1) FROM StagingTable LEFT OUTER JOIN Dimension1 ON Dimension1.SRCKey = StagingTable.SRCKey LEFT OUTER JOIN Dimension2 ON Dimension2.SRCKey = StagingTable.SRCKey LEFT OUTER JOIN Dimension3 ON Dimension3.SRCKey = StagingTable.SRCKey

  3. Run this via SSIS or use a data flow

    INSERT INTO factTable (SK1, SK2, SK3, Fact) SELECT SK1, SK2, SK3, Fact FROM StagingTable

Although there is a bit of hand coding in this I poersonally prefer this to clicking about and messing around in the SSIS user interface and putting up with the performance of the lookup components.

Upvotes: 2

Related Questions