Reputation: 4317
I have a model in Google App Engine that has 50,000+ entities. I would like to create a mapreduce or other operation to iterate over all 50,000+ entities and export the results of a method on the model to a text file. Then once I'm done, I want to download the text file.
What is the easiest way to do this in Google App Engine? I just need to iterate though all the entities and write out the results of expert_data() to a common file.
#Example model
class Car(db.Model):
color = db.StringProperty()
def export_data(self):
return self.color
Upvotes: 1
Views: 274
Reputation: 1248
Export via Datastore Backups to Google Cloud Storage and then download:
http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/
This looks MUCH faster than the other methods. I haven't tried it myself.
Upvotes: 0
Reputation: 1142
if you only need to export to a file and you want all the entities you can use the appengine bulk loader
see appcfg.py download_data
also https://developers.google.com/appengine/docs/python/tools/uploadingdata
it handles retries throttling threading etc
Upvotes: 0
Reputation: 1557
I would do this a different way - and somebody please tell me if there's a weakness here.
I would use a task queue and a cursor. Do your query for the first 1000 results or so, output data to a blobstore file using the experimental blobstore programmatic write API. Then, reschedule self with the cursor and keep appending to the file with each subsequent iteration and picking up the query at the cursor, until you're done.
This might be slow - but it won't noticeably effect a running app, and unlike mapreduce it won't spawn a gazillion instances and potentially cost you actual money. It probably won't even result in a single additional instance spawn.
Upvotes: 0
Reputation: 2033
Use the mapreduce API: https://developers.google.com/appengine/docs/python/dataprocessing/. It also has a BlobstoreOutputWriter which you can use to create a blob and then download that blob.
As per suggestion by Dave, here is an example: http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/demo/main.py#264
Upvotes: 4
Reputation: 3769
I find it easiest to do this sort of thing using the remoting api, otherwise you're going to have to store the data in the blobstore and then export it when you're done.
The remoting api isn't as fast as running it on appengine itself, but it's certainly a lot easier.
Upvotes: 2