Reputation: 434
I'm using the top-down approach for my data warehouse, so the DW is in 3NF, and the data marts are multidimensional. I've read, that the DW should have a timestamp (or something like that) to preserve historical data.
My questions are:
Do I have to store data from the past in my DW, or is it enough to have time-variant dimensions (technically I use type 2 SCD's, where it's necessary).
If the DW must be time-variant, then:
when do we use SCD's in the top-down approach?
should I use the timestamps as the part of the table's PK?
Thanks!
Upvotes: 0
Views: 705
Reputation: 66622
If possible, try to avoid tracking history in a normalised schema. Type 2 SCDs are much, much simpler. However, you do need to make your data marts persistent - the history can't be reconstructed, so the data marts are the canonical source of your historical data.
A few tips:
Don't reuse ODS/DW keys in your dimensions. Merge on the natural key of the data. This decouples your ODS from your star schemas and allows you to rebuild your ODS without affecting the star schemas.
Tracking history and reconstructing a historical position from a complex normalised data is really fiddly. Don't bother; stick with 'current state' in your ODS.
Make a generic type-2 dimension handler (you can use data from the system data dictionary). This allows you to present a pre-load table and use the handler to merge the data to the dimension.
Doing this separates the ETL from the history tracking and allows you to test them in isolation. You can test the ETL by verifying that the preload table is correct - you don't have to test before/after scenarios. You can test history tracking with a set of unit tests for the dimension handler.
This type of architecture simplifies testing dramatically.
One advantage of persistent data marts rather than a persistent ODS is that you can easily do major surgery on the ODS. If the ODS doesn't have to track history it can be dropped and re-created as necessary, and you can easily change the data model without having to migrate historical data.
Upvotes: 1