Synchronizing databases between testing and production environment with git

Question

I am trying to implement git in our development process for php/mysql applications.

So far I have git repos set up, and we are using our local machines for testing and we are successfully using it for the files, but I don't really know how to handle the database?

There is a cron script on the server that exports and commits a fresh version of the live database to the repo, and I have set up a post-merge hook on my development machine to update my local database, and that works well for syncing in the direction production->testing.

I don't know how to make the sync the other way around. At the moment I am doing changes on the database on the live server, but I don't think that is a good practice.

So I am open to suggestions.

EDIT: I asked this question, before I understood how schema migrations work. I am now using django for most of my web development work which has a great migrations module called South (part of django core from version 1.7). So most MVC frameworks should have a migrations module that deals with this issue.

bishop · Accepted Answer

If you want to version control a full database implemented in flat files, I recommend using an sqlite backend.

If your application requires MySQL (eg certain procedures or views), then version control the database manipulation scripts (eg CREATE, ALTER) and the population scripts (eg, INSERT) and run those as part of your build process. dbdeploy is a good pattern to follow.

Update to Comment on your Existing Procedure
You are essentially treating your production as the master copy of the schema. What happens if you ever have multiple masters? What happens when your database becomes too large to effectively do this? What if you need to isolate a bug in an older version of your code -- how easy is it to rollback to a particular code/schema state?

Ten years ago, I did this very thing. I had a small database -- a dozen tables or so. The application has grown. It has 2000 tables now. Consider me a ghost of Christmas Future warning you away from the pain I've been through.

Recommendation
The process is easy in theory: make database changes in development, apply those to production, done. The reality is much harder. I'll explain what I mean by example...

You are part of a development team. You want to add support for disabling accounts. Besides the code changes, you decide to implement that as a bit field in the database, like this:

ALTER TABLE Account ADD COLUMN disabled BIT(1) DEFAULT 0;

Now like I mentioned above, you put this in a file under version control, let's call it addDisabledColumnToAccount.sql. You commit the SQL and the code changes and push them.

The other team members get your change, apply the schema change, and test it. You all collectively decide to increase the original scope of the change to support a status on an account, not just disabled. So you do this:

ALTER TABLE Account DROP COLUMN disabled;
ALTER TABLE Account ADD COLUMN status ENUM('active','disabled','closed') NOT NULL DEFAULT 'active';

and commit your changes, and push. Everybody reapplies this and is happy. You push it into your trunk (or however you mark stable) and it's ready to go to production.

Ok. Can you spot the problem? There are several. The first major problem is that if you try and run this script on production, it will fail. Why? Because production never had the "disabled" column added.

The general issue is that development databases go through interim changes that production need not/may not go through. There are lots of ways to get around this. The way I like to do it is to keep two tracks of change scripts: development and production. Development scripts are cumulative, accreting changes as development proceeds -- production scripts represent the final result of development. They both leave the database ending at the same state, which means your final tests against the final of either should both pass.

What other issues are there? Well, developers might name two script files the same thing. So you have to establish some standard that keeps script names from colliding. Generally if the script names are associated with some unique issue ID, that will do it.

Another problem is that each schema change script might have a MySQL version, an SQLite version, an Oracle version, etc. The more databases you support, the more schema changes you have to deal with. Keep this in mind as you name your scripts. Something like addColumnToTable.mysql.sql, addColumnToTable.sqlite.sql, etc.

Finally you have to ensure schema changes are applied at the same time corresponding code change are applied. What I'm saying is that generally code and database changes go hand in hand. So long as your deploy technology lets you apply both at the same time, and -- importantly -- rollback both if there's a problem, you're good. But that level of automation can be hard, and doing it by hand is error prone.

In summary, git will help you track your changes, and help you to cooperatively create changes to your schema. But it is not particularly a deployment tool, owing to the complexities of dealing with changes in schema over time.

Synchronizing databases between testing and production environment with git

Answers (1)

Related Questions