Reputation: 7535
I have a Laravel project (API for an iOS app) and am currently setting up a continuous integration server (Jenkins) to handle deployments to AWS. I'm using tools such as Capistrano, Packer and Terraform to accomplish this.
And currently, the app has two environments: Staging and Production.
However, i'm trying to find a good way to work with Databases in this system.
Basically, i envision the pipeline being something like:
However, between steps 3 and 4, I'd love to do a "dry run" of the production deployment -- which is to say, trying out migrations, and having access to the potentially large data set that production will have.
So I see 2 options:
1) When we're ready to QA, export the Production DB and import it into Staging. Then run "the process" (migrations, terraform, packer, etc). If all goes well, move to Production
PROS:
CONS:
2) Instead of importing from Production, write configurable seeders for all the database models and run as needed for QA.
PROS:
CONS:
You have to keep your seeders up to date with any Model changes you make.
In general, this process seems more subject to human error, versus exporting the actual data set from Production.
How do people general approach this process?
Upvotes: 1
Views: 606
Reputation: 56849
Your staging environment wants to look as much like production as possible otherwise it kind of defeats the point of having it because it's going to be difficult to QA it or use it for actually testing you aren't about to break production.
As such your database migrations should move with the code and any changes you make to the underlying schema should be committed at the same time as the code that uses those changes and thus propagated through your CI pipeline at the same time.
As for actual data, we take regular snapshots of our databases (running on RDS in AWS) and then restore these in to our "like live" environments. This means our testing environments have a similar amount of data to production so we can see the impact of things like a database migration and how long it takes to perform before it hits production.
We also have some more stripped down environments for running an extensive automated test suite but these have minimal generated data that is just enough for running the tests.
In our case we are also handling personally identifiable information so our snapshot process is actually slightly more convoluted as we also randomise any potentially sensitive data, generating new names and contact details etc.
For you it's likely to come down to how painful it is to really restore data from production. I would say start with doing that and when it gets too painful or slow then consider the move to generating the data set instead and making sure it's of a size big enough to simulate production or give you a good understanding of the real world.
So in your case I would start something like this:
aws_ami
data source.I would suggest using blue/green deploys to roll your new AMI out using a strategy such as outlined here but that's a separate question in itself (with plenty of resources elsewhere).
Upvotes: 4