Reputation: 554
This is probably any team will encounter at some point so I'm counting on experience other guys had.
We are in a process of migrating old MySQL database to a new database with structure changed quite a bit. Some tables were split into multiple smaller tables, some data was joined from multiple smaller to one larger table.
We ran a test and it takes a few hours to migrate database to a new form. The problem is, the old database is our production database, changes every minute. We cannot have a few hours downtime.
What approach do you think would be ok in such a situation?
Let's say you have table called "users" with 1M rows. It's being changed every second. Some fields are updated, some rows are added and some rows are deleted. That's the problem why we cannot make a snapshot at certain point of time because after the migration is done, we would have 3 hours of data unsynced.
Upvotes: 3
Views: 1934
Reputation: 29639
One approach I've used in the past was to use replication.
We created a replication scheme between the old production database and a slave which was used for the data migration. When we started the migration, we switched off the replication temporarily, and used the slave database as the data source for the migration; the old production system remained operational.
Once the migration script had completed, and our consistency checks had run, we re-enabled replication from the old production system to the replicated slave. Once the replication had completed, we hung up the "down for maintenance" sign on production, re-ran the data migration scripts and consistency checks, pointed the system to the new database, and took down the "down for maintenance" sign.
There was downtime, but it was minutes, rather than hours.
This does depend on your database schema to make it easy to identify changed/new data.
If your schema does not lend itself to easy querying to find new or changed records, and you don't want to add new columns to keep track of this, the easiest solution is to create separate tables to keep track of the migration status.
For instance:
TABLE: USERS (your normal, replicated table)
----------------------
USER_ID
NAME
ADDRESS
.....
TABLE: USERS_STATUS (keeps track of changes, only exists on the "slave")
-----------------
USER_ID
STATUS
DATE
You populate this table via a trigger on the USERS table for insert, delete and update - for each of those actions, you set a separate status.
This allows you to quickly find all records that changed since you ran your first migration script, and only migrate those records.
Because you're not modifying your production environment, and the triggers only fire on the "slave" environment, you shouldn't introduce any performance or instability problems on the production environment.
Upvotes: 1
Reputation: 7665
Could you run the new database in parallel with the current one? That way you can later migrate the old data from your old db to your new one and your "live" situation will already have been captured on the new one.
What I mean is: when you write something to the old db, you will also have to write the data to the new one.
Upvotes: 0
Reputation: 1571
There's one approach I used once and that should work for you too, however you'll need to do modify your production datasets for that. Just briefly:
That way you can run the migration script as often as you like.
You will have a downtime, but it will be just a minimal one because during that downtime you only have to migrate a few datasets (practically the last "delta" between the last run of the migration script and now).
Upvotes: 0