Chris Marasti-Georg
Chris Marasti-Georg

Reputation: 34650

What is the best way to migrate data from BigTable/GAE Datastore to RDBMS?

Now that Google has announced availability of Cloud SQL storage for app engine, what will be the best way to migrate existing data from BigTable/GAE Datastore to MySQL?

To respond to the excellent questions brought up by Peter:

It seems like the bulk uploader should be able to be used to download the data into a text format that could then be loaded with mysqlimport, but I don't have any experience with either technology. Also, it appears that Cloud SQL only supports importing mysqldumps, so I would have to install MqSQL locally, mysqlimport the data, then dump it, then import the dump?

An example of my current model code, in case it's required:

class OilPatternCategory(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    default = db.BooleanProperty(default=False)

class OilPattern(db.Model):
    version = db.IntegerProperty(default=2)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    length = db.IntegerProperty()
    description = db.TextProperty()
    sport = db.BooleanProperty(default=False)
    default = db.BooleanProperty(default=False)
    retired = db.BooleanProperty(default=False)
    category = db.CategoryProperty()

class League(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Tournament(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Series(db.Model):
    version = db.IntegerProperty(default=3)
    created = db.DateTimeProperty(auto_now_add=True)
    user = db.UserProperty(required=True)
    date = db.DateProperty()
    name = db.StringProperty()
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    league = db.ReferenceProperty(League)
    tournament = db.ReferenceProperty(Tournament)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()
    allow_comments = db.BooleanProperty(default=True)
    complete = db.BooleanProperty(default=False)
    score = db.IntegerProperty(default=0)

class Game(db.Model):
    version = db.IntegerProperty(default=5)
    user = db.UserProperty(required=True)
    series = db.ReferenceProperty(Series)
    score = db.IntegerProperty()
    game_number = db.IntegerProperty()
    pair = db.StringProperty()
    notes = db.TextProperty()
    entry_mode = db.StringProperty(choices=entry_modes, default=default_entry_mode)

Upvotes: 1

Views: 668

Answers (1)

Chris
Chris

Reputation: 1162

Have you considered using the Map Reduce framework? You could write mappers that store the datastore entities in CloudSQL. Do not forget to add a column for the datastore key, this might help you avoiding duplicate rows or identifying missing rows.

You might have a look at https://github.com/hudora/gaetk_replication for an inspiration on the mapper functions.

Upvotes: 1

Related Questions