Reputation: 859
I want to implement the scd2 in the snowflake tables. My source and target tables are present in snowflake only. The entire process has to be done using Azure Data Factory. I went through the documentation given by azure for implementing the scd2 using data flows but when I tried to create a dataset for snowflake connection its showing as disabled.
Is there any way or any documentation where I can see the steps to create SCD2 in adf with snowflake tables.
Thanks vipendra
Upvotes: 0
Views: 1592
Reputation: 633
If your source and target tables are both in Snowflake, you could use Snowflake Streams to do this. There's a blog post covering this in more detail at https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1
However, in short, if you have a source table source
, you can put a stream on it like so:
create or replace stream source_changes on table source;
This will capture all the changes that are made to the source table. You can then build a view on that stream that establishes how you want to feed those changes into the SCD table. (The blog post uses case statements to put start and end dates in on each row in the view).
From there, you can use a Snowflake Task to automate the process of loading from the stream into the SCD only when the Stream actually has changes.
Upvotes: 0
Reputation: 3838
SCD2 in ADF can be built and managed graphically via data flows. The Snowflake connector for ADF today does not work directly with data flows, yet. So for now, you will need to use the Copy Activity in an ADF pipeline and stage the dimension data in Blob or ADLS, then build your SCD2 logic in data flows using the staged data.
Your pipeline will look something like this:
[Copy Activity Snowflake-to-Blob] -> [Data Flow SCD2 logic Blob-to-Blob] -> [Copy Activity Blob-to-Snowkflake]
We are working on direct connectivity to Snowflake from data flows and hope to land that soon.
Upvotes: 1