user12761950
user12761950

Reputation: 149

In Snowflake how to make NEW data available once all related objects have been loaded?

Lets say we have dimensional data model in our Snowflake data warehouse that is loaded once a day. The requirement is that the data loaded today should be available to end users only when ALL objects have been loaded successfully.

So if dimension d1 has been loaded successfully today, but fact f1 load fails for what ever reason, then the end users should stay with the data loaded yesterday (also for the dimension d1 then). The flip over to the new data should take place only when all dimensional model objects have loaded successfully.

We have currently then done it the way that we stage the incremental set / new data to staging tables and once all successfully there then there is a huge SQL script doing DELETE/INSERT one dimensional model table at a time and then the single commit in the end. So the enda data load is not done in parallel.

Is there a better way to achieve the requirement? We were thinking about materialized views (load all the underlying dimensional model tables and the refresh the "select * from table" mat. views ) but they are refreshed automatically in Snowflake. Also then DROP target table / CLONE from source to target table does not suite us as there is the chance that the target table is queried at the time when it does not exist.

Upvotes: 1

Views: 250

Answers (2)

Rajib Deb
Rajib Deb

Reputation: 1774

Ok, I think I may have understood your problem statement. But let me rephrase it so that I am sure I have really understood it. You have a number of dimension tables and a fact table. For example

You have D1, D2,D3 as dimension table and then you have a fact table F1. Your daily ETL process loads F1, D1, D2 and D3. But you want to load D1,D2,D3 with the new data only when the F1 load is successful and during the load you do not want users to get affected.

If this is correct one approach will be as below(this approach has a disadvantage that it will have additional storage cost associated for the duration of the ETL)

When your ETL starts, clone new tables F1', D1',D2',D3' from the respective F1, D1,D2,D3 tables. Load the incremental data first to F1',D1',D2' and D3'. If the load is successful for all these tables do an ALTER SWAP and swap F1' to become F1, D1' to become D1, D2' to become D2 and D3' to become D3. Post that you can truncate F1',D1',D2'D3' and repeat this process everyday.

Upvotes: 1

Gokhan Atil
Gokhan Atil

Reputation: 10134

I do not understand why you eliminate the CLONE option. You do not need to clone each object, you can clone the entire database or schema with one command, make it available to your users. Next day, when all data is loaded, you can re-clone the object. It seems like an ideal solution for what you want to achieve.

https://community.snowflake.com/s/article/cloning-in-snowflake

The CLONE operation does not copy data, it is a metadata operation. Therefore it will not occupy extra space unless the cloned data is removed from the original/source tables, or you add new data to the cloned objects:

https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#cloning-tables-schemas-and-databases

Upvotes: 0

Related Questions