Reputation: 6879
Following up this question here.
I finally wrote up a code generation tool to wrap all my database data into something like this:
Pdtfaamt(fano=212373,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='A3',itemamt=75,type=0).save()
Pdtfaamt(fano=212374,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E1',itemamt=75,type=0).save()
Pdtfaamt(fano=212375,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='S',itemno='E6',itemamt=75,type=0).save()
Pdtfaamt(fano=212376,comsname='SMM',pdtcode='20PLFCL',kind='1',fatype='C',itemno='A3',itemamt=3,type=1).save()
Yes, that's right! I pulled the entire database out and transformed the data into population instruction codes so that I am able to migrate my database up to GAE.
So I deployed the django-nonrel project, used django-nonrel remote api to trigger the data population process.
It works okay, except that there is a problem: it's extremely slow. Could anyone tell me how I will be able to improve the speed? I have done some calculation, it may take up to 30 days to get all my data up and running there on GAE.
ps. I am using django-nonrel, and djangoappengine for the backend.
Upvotes: 1
Views: 264
Reputation: 14175
Write your import script to take advantage of python's multiprocessing Pool
def import_thing(data):
thing = ThingEntity(**data)
thing.put()
def main():
data = [{fano:'212374', comsname:'SMM', },
{fano:'212374', comsname:'212375', },
...etc ]
pool = multiprocessing.Pool(4) # split data into 4 parts to run in parallel
pool.map(import_thing, data)
Since the AppEngine production servers like having lots of connections you should play around with the pool size to find the best number. This will not work for importing to the dev server as it's single-threaded.
Also important: Ensure you are putting them in batches of say 10-20 not putting one at a time, or the round-trips will be killing your performance. So an improved script should work in chunks like:
data = [
[item1,item2,item3],
[item4, item5, item6],
[item7, item8, item9],
]
pool.map(import_batch, data)
Upvotes: 2