Dan
Dan

Reputation: 141

dbt vs Azure Data Factory -> Snowflake

I've been looking into dbt to understand what I can & can't do.

To give a bit of context I currently have Azure Data Factory loading copies of databases into Snowflake and am looking at building up history as copies of each table with timestamps and a row hash, then building a Data Vault and models off the top.

I'm currently looking at whether to use dbt or Data Factory for the History and if dbt would be best placed for creating the Vault onwards. I currently have a Data Factory Pipeline to create tables that represent the source and Land them in Snowflake. It would be fairly simple to extend this to create the history tables and run a dynamic stored procedure in Snowflake to populate history using a Merge statement or implementing streams (CDC) within Snowflake.

My understanding is that dbt is supposed to use the SELECT statement for each model so would need a model for each table and would need to be Incremental which kind of goes against what I'm looking for as there would be a lot of repetition of code. However I like the documentation & DAG visualisation.

My questions are;

  1. Could the Information_Schema be read to dynamically define the sources in the .yml file for models? (i.e. take a list of tables / columns from the Information_Schema).

  2. dbt has "run_query" which I could use to a Merge statement or Insert/Update/Delete based off CDC, could I use this within a model referencing a Macro to do an action for each item in a Schema? Also would this show through in the auto documentation?

  3. Has anyone seen any good guides for CI/CD from Azure DevOps for dbt?

Any advice / opinions are appreciated.

Thanks, Dan

Upvotes: 1

Views: 3671

Answers (1)

sgdata
sgdata

Reputation: 2763

I have this active question which is semi-relevant to your point 1 since I think there should be a way to do this during the compile step based on manifest & catalog which doesn't requiring querying multiple schemata in a multi-db instance.

Additionally, I would recommend looking into the bootstrap function of the dbt-helper project which looks like it meets your use-case:

Create starter "schema.yml" files for your project. This function helpfully generates boilerplate dbt-model files for you so you don't have to go through the copy/paste when you're developing a new model.

I'm not really sure what you're aiming for with point 2 so I'll leave that one to the side for now. In general there's a lot of things you can do with macros that aren't strictly model building.

Finally, these two articles (one, two) seem like they have some relevant materials for you?

In general though, I would anticipate that what will become the most common orchestration path will be airflow à la this article but unfortunately Azure has yet to indicate that it will have a product comparable to AWS's MWAA or GCP's Cloud Composer.

Upvotes: 1

Related Questions