Slowly Changing Dimension Type 2

I am trying to create an SCD type 2 data warehouse with Pentaho. I don't actually know how to load surrogate_ids and other data simultaneously and properly. I'd be glad if someone helps.

I'm trying to do SCD type 2. But I couldn't do it because I was new. I want to see the real project.

Upvotes: 0

Views: 141

Answers (1)

Bert-Jan Stroop
Bert-Jan Stroop

Reputation: 549

The most common way, and probably the most effectiove way is to

  • either setup a view in your database with the functionality for generating the data of the SCD2 and loading this into pentaho via table input step
  • or by retrieving the data from your database via table input, and writing your functionality to generate the SCD2 directly in your input table step.

Paralel to that also have a second inputstep connected to your DWH where you are retrieving the current data from the set. In case of SCD2 just retrieving active data is normally enough, but it depends on your precise layout of the dimension.

Then incluse a merge step to compare both the new data (from rel database) to the existing data (from dwh). This comparison is NOT done on the surrogate key, but on the real multicolumn key. So probably something like systemid, validitystartddate or systemid, version. Depends again on your setup of the SCD2.

After the comparison then the merge step returns a flagfield per row, which indicates if the row is updated, deleted, new, or unchanged. Based on these flagfields you can use a switch step to send all of the four flows (or 2 if you are ignoring the unchanged (no action needed) and deleted (history is maintained)) to the right functionality.

So in case of changed, you might want to update the row, which in principle should just be an update of an validity_enddate from null to a date.

In case of new, you might want to insert the row. If you setup a sequence as default value for you surrogate_id in your dwh it should automatically get a new surrogate_id.

If needed i will add some screenshots later if i have some more time. Let me know if it is needed.

Upvotes: 0

Related Questions