aphrid
aphrid

Reputation: 599

For Snowflake prod->dev scheduled cloning, what is a good way to handle custom rules?

We are trying to setup dev and qa environments using data from a prod environment.

We are not using CREATE DATABASE dev CLONE prod because we are trying to avoid cloning database-specific objects like stages and pipes, since we are using per-environment Terraform to manage pipe-related objects and want to avoid out-of-band changes to those objects.

On top of that, there are some tables that should not be cloned from prod->dev. I'm trying to design a cleaner solution than the cobbled mess that we have.

We have a scheduled script that does the following:

So... it works, but it's hacky, fragile, and prone to updating because of custom rules. We currently have this running on an AWS lambda, but a first step would be to migrate this to pure Snowflake.

Does anyone have any suggestions to improve this process? Or at least have recommendations on Snowflake tools that

Upvotes: 0

Views: 955

Answers (1)

NickW
NickW

Reputation: 9768

I realise this is not really an answer to your question but I would absolutely not do what you are proposing to do - it's not the way to manage your SDLC (in my opinion) and, especially if your data contains any PII information, copying data from a Prod to a non-Prod database runs the risk of all sorts of regulatory and audit issues.

I would do the following:

  1. As a one-off exercise, create the scripts necessary to build the objects for your "standard" environment - presumably basing this off your current Prod environment
  2. Manage these scripts in a version-controlled repository e.g. Git

You can then use these scripts to build any environment and you would change them by going through the standard Dev, Test, Prod SDLC.

As far as populating these environments with data goes, if you really need Production-like data (and production volumes of data) then you should build routines for copying the data from Prod to the chosen target environment that, where necessary, anonymise the data. These scripts should be managed in your code repository and as part of your SDLC there should be a requirement to build/update the script for any new/changed table

Upvotes: 1

Related Questions