Carson Morrow
Carson Morrow

Reputation: 317

GAE Datastore performance vs SQLite

I'm seeing terrible performance using the GAE datastore on both the dev server and the production server. I have the following simplified model:

class Team(db.Model):
    name = db.StringProperty()
    # + 1 other property
    # home_games from Game
    # away_games from Game

class Game(db.Model):
    date = db.DateProperty()
    year = db.IntegerProperty()
    home_team = db.ReferenceProperty(Team, collection_name='home_games')
    away_team = db.ReferenceProperty(Team, collection_name='away_games')
    # + 4 other properties
    # results from TeamResults

class TeamResults(db.Model):
    game = db.ReferenceProperty(Game, collection_name='results')
    location = db.StringProperty(choices=('home', 'away'))
    score = db.IntegerProperty()
    # + 17 other properties

I only have one index, on Game year and date. Inserting a small dataset of 478 teams and 786 games took about 50 seconds. A simple query:

games = Game.all()
games.filter('year = ', 2000)
games.order('date')

for game in games:
    for result in game.results:
        # do something with the result

took about 45 seconds.

I'm moving from SQLite-based data storage, and the above query on a much larger dataset takes a fraction of a second. Is my data just modeled poorly? Is Datastore just this slow?

Edit 1
To give a little more background, I'm inserting data from a user-uploaded file. The file is uploaded into the blobstore, then I use csv.reader to parse it. This happens periodically, and queries are run based on cron jobs.

Upvotes: 0

Views: 309

Answers (3)

Dave W. Smith
Dave W. Smith

Reputation: 24966

I don't see any evidence that you're using indexed=False on any of your properties. Each such property will take two additional writes (one for the ascending index, one for the descending one) per write. Those add up quickly.

Upvotes: 1

jdevelop
jdevelop

Reputation: 12306

your problem is that you insert these records one by one

you need to use batch inserts, see https://developers.google.com/appengine/docs/python/tools/uploadingdata

Or you may want to insert list of records, as described in documentation:

https://developers.google.com/appengine/docs/python/datastore/entities#Batch_Operations

Upvotes: 2

voscausa
voscausa

Reputation: 11706

You do not need the bulk loader, because you already uploaded the CSV. But you can use batch insert. See these tips: http://googleappengine.blogspot.nl/2009/06/10-things-you-probably-didnt-know-about.html Look for : 5. You can batch put, get and delete operations for efficiency

Upvotes: 0

Related Questions