orion3
orion3

Reputation: 9935

Saving database data changes in commits (if not sqlite)

We have a Rails project that uses PostgreSQL-specific queries, so it's uncomfortable to use sqlite even in development mode. The problem is, I'd love to track schema changes in database and omit running migrations on request, and I'd also love to track db-data changes with git, so that I wouldn't need to dump the db and load it on my machine. So basically I only want to do 'git pull' and see the application working with the new schema and data.

What are the possible strategies here? The only that comes to my mind is to use a simple wrapper that takes an sql-query, checks if it has any db-specific parts and rewrites it for development environment, so that we could still use sqlite. What else?

Upvotes: 1

Views: 679

Answers (2)

Taichman
Taichman

Reputation: 1118

If I understand correctly, you want to be able to track both schema and data changes.

These are, in fact, two very different things -

  • Schema changes - this is discussed in several other questions (here, here and here). The main take from the answers to these questions is that you can either dump your schema to SQL files and track them with your regular source control (git, svn, etc.) or you can use a DB specific SW (red-gate, dbmaestro). However, this won't allow you to completely re-create an identical copy of a DB on another server. Which brings me to -
  • Data changes - This is harder, because (like @jonathan wrote) it's difficult track the changes the DB makes to it's files. I suggest you checkout OffScale DataGrove. DataGrove tracks changes to the entire DB (structure+data). You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (what you called "git-pull").

Disclaimer - I work at OffScale :-)

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 753455

I'm not sure I understand all the nuances of your question - particularly the comments about using SQLite vs PostgreSQL. If it is to be a multi-DBMS system, then testing with multiple systems is good; if it is to be a single-DBMS system, then working with multiple DBMS is making life pointlessly hard.

Also, you talk about tracking the schema changes in the database...is this storing the information about schema changes separately from DBMS's own system catalog, or do you really mean that you want to track database schema changes (using something outside the database - such as a VCS)?

You also talk about tracking 'DB-data changes' which I take to mean 'the data in the tables in the database'. Again, I'm not clear if you are thinking of some sort of dump of the data from the database that covers the differences between what was there, say, a day ago and what is there now, or something else.

These issues might be why you didn't get a response for over 4 hours.

When you talk about a 'simple wrapper', you are not talking about something that I'd call simple. It has to parse arbitrary SQL, work out whether any of it is DBMS-specific, and then apply rewrite rules. That is a non-trivial undertaking. Getting the wrapper called in the right places could be non-trivial too - it depends on the set of APIs you are using to access the DBMS, amongst other things.

What else?

  • Use the same DBMS in both production and development?
  • Tracking just schema changes is non-trivial. You need to track the essence of the schema (such as table name, column names, etc) and not the accidence (yeah, I was rereading Brooks' "No Silver Bullet" earlier) such as the TabID (which might vary without the schema being materially different). However, an analysis would tell you whether the schema is different.
  • Tracking the data changes, independent of schema changes, is also non-trivial. In general, the volume of such data is large. You may be able to deal with a full archive or a full unload or export of the database - but ensuring that the data is presented in the same sequence each time may require some care on your part. If you don't ensure the correct sequencing, the VCS will be recording huge changes due to ordering differences.

All the above amounts to the dreaded "it depends" answer. It depends on:

  • Your DBMS
  • Your database size
  • The volatility of your schema
  • The volatility of your data

It only marginally depends on your VCS or platform, fortunately.

Upvotes: 2

Related Questions