Reputation: 1
I've finally made the jump from on prem to cloud and have set up an initial data warehouse with a couple of basic stars in the schema. I'm near enough done on the Data Factory data pipelines to load them.
Now on to the CI / CD question(s) - how do you manage to keep the data warehouse and data pipeline releases in sync? We have a UAT environment and a PROD environment and I want to make sure that changes are managed in such a way that the warehouse data structure and the ADF data pipelines stay in line with each other.
What do people use to keep them in line?
I've considered just manually keeping track of release numbers via an Azure DevOps releases process - the data warehouse structure builds fine and I can set it to auto build and auto deploy if required. But how can I keep releases of the data warehouse structure and the pipelines in sync when I come to promote changes to production?
So if 1.1.0 in the warehouse structure is compatible with 1.1.0 through 1.1.2 in the ADF pipelines, does this simply need documenting in confluence or similar and managing manually?
Upvotes: 0
Views: 73
Reputation: 3816
As mentioned in comments, in order to keep them in sync, you can use a combination of version control, CI/CD pipelines and compatibility mapping.
For example, you can store both your database schema scripts and ADF pipeline definitions in a Git repository. Use proper version tags (e.g. v1.1.0) to indicate compatibility between the two.
Or ensure your pipeline logic is set in such a way that it deploys both database and ADF changes together. Sequence the deployment so database changes are applied first, followed by ADF pipelines.
Lastly as a good practice, maintain a compatibility file something like compatibility.json
in your repo to track which database versions work with which ADF versions.
Kindly checkout these documents from MS for further clarity-
Upvotes: 0