Reputation: 599
We are trying to setup dev
and qa
environments using data from a prod
environment.
We are not using CREATE DATABASE dev CLONE prod
because we are trying to avoid cloning database-specific objects like stages
and pipes
, since we are using per-environment Terraform to manage pipe-related objects and want to avoid out-of-band changes to those objects.
On top of that, there are some tables that should not be cloned from prod->dev. I'm trying to design a cleaner solution than the cobbled mess that we have.
We have a scheduled script that does the following:
prod
and dev
databases and fetch the right src
and dst
schemasSELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<>' AND TABLE_TYPE = 'BASE TABLE'
to get tables to clone
CREATE OR REPLACE TABLE <dev> CLONE <prod> COPY GRANTS;
GET_DDL(<dev>)
to see if the table has sequences/constraints to updateCREATE OR REPLACE SEQUENCE <dev> CLONE <prod>
to update the nextval
of the sequence since our table was cloned and references the sequence from the source database (and it also has the wrong value anyways)ALTER TABLE <dev> ALTER COLUMN <> SET DEFAULT <new seq>.nextval
ALTER TABLE <dev> DROP CONSTRAINT <>
since the cloned tables reference the source databaseALTER TABLE <dev> ADD CONSTRAINT <>
to rebuild them to reference the destination databaseSo... it works, but it's hacky, fragile, and prone to updating because of custom rules. We currently have this running on an AWS lambda, but a first step would be to migrate this to pure Snowflake.
Does anyone have any suggestions to improve this process? Or at least have recommendations on Snowflake tools that
Upvotes: 0
Views: 955
Reputation: 9768
I realise this is not really an answer to your question but I would absolutely not do what you are proposing to do - it's not the way to manage your SDLC (in my opinion) and, especially if your data contains any PII information, copying data from a Prod to a non-Prod database runs the risk of all sorts of regulatory and audit issues.
I would do the following:
You can then use these scripts to build any environment and you would change them by going through the standard Dev, Test, Prod SDLC.
As far as populating these environments with data goes, if you really need Production-like data (and production volumes of data) then you should build routines for copying the data from Prod to the chosen target environment that, where necessary, anonymise the data. These scripts should be managed in your code repository and as part of your SDLC there should be a requirement to build/update the script for any new/changed table
Upvotes: 1