Reputation: 3102

MySQL schema source control

At my company we have several developers all working on projects internally, each with their own virtualbox setup. We use SVN to handle the source, but occasionally run into issues where a database (MySQL) schema change is necessary, and this has to be propagated to all of the other developers. At the moment we have a manually-written log file which lists what you changed, and the SQL needed to perform the change.

I'm hoping there might be a better solution -- ideally one linked to SVN, e.g. if you update to revision 893 the system knows this requires database revision 183 and updates your local schema automagically. We're not concerned with the data being synched, just the schema.

Of course one solution would be to have all developers running off a single, central database; this however has the disadvantage that a schema change could break everyone else's build until they do an svn up.

Upvotes: 4

Answers (4)

Joel B Fant

Reputation: 24766

There are a couple approaches I've used before or currently use:

Sequential Version Number

Most that use this approach have a separate program that grabs a version number from the database, and then executes any statements associated with database versions higher than that number, finally updating the version number in the database.

So if the version is 37 and there are statements associated with version 1 through 38 in the upgrading application, it will skip 1 through 37 and execute statements to bring the database to version 38.

I've seen implementations that also allow for downgrade statements for each version to undo what the upgrade did, and this allows for taking a database from version 38 back down to version 37.

In my situation we had this database upgrading in the application itself and did not have downgrades. Therefore, changes were source-controlled because they were part of the application.

Directed Acyclic Graph

In a more recent project I came up with a different approach. I use classes that are nodes of a directed acyclic graph to encapsulate the statements to do specific upgrades to the database for each specific feature/bugfix/etc. Each node has an attribute to declare its unique name and the names of any nodes on which it was dependent. These attributes are also used to search the assembly for all upgrade nodes.

A default root node is given as the dependency node for any nodes without dependencies, and this node contains the statements to create the migrationregister table that lists the names of nodes that have already been applied. After sorting all the nodes into a sequential list, they are executed in turn, skipping the ones that are already applied.

This is all contained in a separate application from the main application, and they are source-controlled in the same repository so that when a developer finishes work on a feature and the database changes associated with it, they are committed together in the same changeset. If you pull the changes for the feature, you also pull the database changes. Also, the main application simply needs a list of the expected node names. Any extra or missing, and it knows the database does not match.

I chose this approach because the project often has parallel development by multiple developers, with each developer sometimes having more than 1 thing in development (branchy development, sometimes very branch). Juggling database version numbers was quite the pain. If everybody started with version 37 and "Alice" starts on something and uses version 38 so it will change her database, and "Bob" also starts on work that has to change the database and also uses version 38, someone will need to change eventually. So let's say Bob finishes and pushes to the server. Now Alice, when she pulls Bob's changeset, has to change the version for statements to 39 and set her database version back to 37 so that Bob's changes will get executed, but then hers execute again.

But when all that happens when Alice pulls Bob's changeset is that there's simply a new migration node and another line in the list of node names to check against, things just work.

We use Mercurial (distributed) rather than SVN (client-server), so that's part of why this approach works so well for us.

Upvotes: 2

Tim

Reputation: 6961

I'd consider looking at something like MyBatis Schema Migration tools. It isn't exactly what you describe, but I think it solves your problem in an elegant way and can be used without pulling in core MyBatis.

In terms of rolling your own, what I've always done is to have a base schema file that will create the schema from scratch as well as a delta file that appends all schema changes as deltas, separated by version numbers (you can try and use SVN numbers, but I always find it easier just to manually increment). Then have a schema_version table, which contains that information in it for the live database, the canonical schema file will have that information in it and have a script that will run all changes subsequent to the existing DB version from the delta script.

So you'd have a schema like:

-- Version: 1
CREATE TABLE user (
id bigint,
name varchar(20))

You have the tool manage the schema version table and see something like:

> SELECT * FROM schema_version;
1,2011-05-05

Then you have a few people add to the schema and have a delta file that would look like:

-- Version: 2
ALTER TABLE user ADD email varchar(20);
-- Version: 3
ALTER TABLE user ADD phone varchar(20);

And a corresponding new schema checked in with:

-- Version: 3
CREATE TABLE user (
id bigint,
name varchar(20),
email charchar(20),
phone varchar(20))

When you run the delta script against a database with the initial schema (Version 1), it will read the value from the schema_version table and apply all deltas greater than that to your schema. This gets trickier when you start dealing with branches, but serves as a simple starting point.

Upvotes: 2

Jay

Reputation: 27472

An easy solution would be to keep a complete schema in SVN (or whatever library). That is, every time you change the schema, run MySQL "desc" to dump out descriptions of all the tables, overwrite the last such schema dump with this, and then commit. Then if you run a version diff, it should tell you what changed. You would, of course, need to keep all the tables in alphabetical order (or some predictable order).

For a different approach: Years ago I worked on a project for a desktop application where we were periodically sending out new versions that might have schema changes, and we wanted to handle these with no user intervention. So the program had a description of what schema it expected. At start up it did some metadata calls to check the schema of the database that it actually had and compared these to what it expected. If then automatically updated the schema to match what it expected. Usually when we added a new column we could simply let it start out null or blank, so this required pretty much zero coding effort once we got the first version to work. When there was some actual manipulation required to populate new fields, we'd have to write custom code, but that was relatively rare.

Upvotes: 1

Nate

Reputation: 1921

One option is a data dictionary in YAML/JSON. There is a nice article here

Upvotes: 2

MySQL schema source control

Answers (4)

Sequential Version Number

Directed Acyclic Graph

Related Questions