Chris
Chris

Reputation: 4317

What is the simplest method to iterate over 20,000 entities in Google App Engine and export to a text file?

I have a model in Google App Engine that has 50,000+ entities. I would like to create a mapreduce or other operation to iterate over all 50,000+ entities and export the results of a method on the model to a text file. Then once I'm done, I want to download the text file.

What is the easiest way to do this in Google App Engine? I just need to iterate though all the entities and write out the results of expert_data() to a common file.

#Example model
class Car(db.Model):
    color = db.StringProperty()

    def export_data(self):
        return self.color

Upvotes: 1

Views: 274

Answers (5)

bshanks
bshanks

Reputation: 1248

Export via Datastore Backups to Google Cloud Storage and then download:

http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/

This looks MUCH faster than the other methods. I haven't tried it myself.

Upvotes: 0

Ralph Yozzo
Ralph Yozzo

Reputation: 1142

if you only need to export to a file and you want all the entities you can use the appengine bulk loader

see appcfg.py download_data

also https://developers.google.com/appengine/docs/python/tools/uploadingdata

it handles retries throttling threading etc

Upvotes: 0

tempy
tempy

Reputation: 1557

I would do this a different way - and somebody please tell me if there's a weakness here.

I would use a task queue and a cursor. Do your query for the first 1000 results or so, output data to a blobstore file using the experimental blobstore programmatic write API. Then, reschedule self with the cursor and keep appending to the file with each subsequent iteration and picking up the query at the cursor, until you're done.

This might be slow - but it won't noticeably effect a running app, and unlike mapreduce it won't spawn a gazillion instances and potentially cost you actual money. It probably won't even result in a single additional instance spawn.

Upvotes: 0

schuppe
schuppe

Reputation: 2033

Use the mapreduce API: https://developers.google.com/appengine/docs/python/dataprocessing/. It also has a BlobstoreOutputWriter which you can use to create a blob and then download that blob.

As per suggestion by Dave, here is an example: http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/demo/main.py#264

Upvotes: 4

Rick Mangi
Rick Mangi

Reputation: 3769

I find it easiest to do this sort of thing using the remoting api, otherwise you're going to have to store the data in the blobstore and then export it when you're done.

The remoting api isn't as fast as running it on appengine itself, but it's certainly a lot easier.

Upvotes: 2

Related Questions