Reputation: 1
Would like to know how can I create and load Type 2 (Slowly Changing) dimension table in snowflake cloud database. Most of the cases, the loading of this table involves updates and inserts of records up to 1-50 per day. If we follow the traditional approach then will the snowflake storage will grow significantly in shot time as every small change it will create a separate storage per load.
Upvotes: 0
Views: 693
Reputation: 690
You could bring your data into a permanent table with INSERT/UPDATE/DELETE/MERGE
statements. This table would represent the "current state" of your dimension. On this table you could create a STREAM
object which tracks the changes, then setup a stored procedure to gather the data from the stream and write those changes to an SCD2 table. The procedure can be invoked by a task running on a schedule or invoked from another task that successfully updates the "current state" table. Finally, the SCD2 table could be transient to avoid added costs of fail-safe and added time-travel (as Mike mentioned in his answer).
Flow:
Data into "current state" table --> Task running proc reads from stream --> Updates SCD2
Documentation:
https://docs.snowflake.net/manuals/user-guide/streams.html
Storage is cheap, but it is good to be cost aware.
Upvotes: 1
Reputation: 7339
On inserts and updates, Snowflake creates new micropartitions, but based on what your time-travel settings for the table, then the stale micropartitions will be dropped. So, I wouldn't worry too much about it (space is cheap, too). If you are concerned with the time-travel or fail-safe creating an extra cost, you can use a Transient table, instead of permanent tables. These will not have fail-safe costs.
https://docs.snowflake.net/manuals/user-guide/data-time-travel.html
https://docs.snowflake.net/manuals/user-guide/tables-temp-transient.html#comparison-of-table-types
Hope that helps. Otherwise, there isn't any difference on how you should handle Type 2 dimensions.
Upvotes: 1