Winston Chen
Winston Chen

Reputation: 6879

Is there any way to make django remote api run faster in GAE?

Following up this question here.

I finally wrote up a code generation tool to wrap all my database data into something like this:

Pdtfaamt(fano=212373,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='A3',itemamt=75,type=0).save()
Pdtfaamt(fano=212374,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E1',itemamt=75,type=0).save()
Pdtfaamt(fano=212375,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E6',itemamt=75,type=0).save()
Pdtfaamt(fano=212376,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='C',itemno='A3',itemamt=3,type=1).save()

Yes, that's right! I pulled the entire database out and transformed the data into population instruction codes so that I am able to migrate my database up to GAE.

So I deployed the django-nonrel project, used django-nonrel remote api to trigger the data population process.

It works okay, except that there is a problem: it's extremely slow. Could anyone tell me how I will be able to improve the speed? I have done some calculation, it may take up to 30 days to get all my data up and running there on GAE.

ps. I am using django-nonrel, and djangoappengine for the backend.

Upvotes: 1

Views: 264

Answers (2)

Chris Farmiloe
Chris Farmiloe

Reputation: 14175

Write your import script to take advantage of python's multiprocessing Pool

def import_thing(data):
    thing = ThingEntity(**data)
    thing.put()

def main():
    data = [{fano:'212374', comsname:'SMM', },
              {fano:'212374', comsname:'212375', },
              ...etc ]
    pool = multiprocessing.Pool(4) # split data into 4 parts to run in parallel
    pool.map(import_thing, data)

Since the AppEngine production servers like having lots of connections you should play around with the pool size to find the best number. This will not work for importing to the dev server as it's single-threaded.

Also important: Ensure you are putting them in batches of say 10-20 not putting one at a time, or the round-trips will be killing your performance. So an improved script should work in chunks like:

data = [
    [item1,item2,item3],
    [item4, item5, item6],
    [item7, item8, item9],
]
pool.map(import_batch, data)

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599490

You probably want to look into the Mapper API.

Upvotes: 1

Related Questions