bmurr
bmurr

Reputation: 141

How to use Queries efficiently in GoogleAppEngine with Python?

Let's say I have an entity of kind Dog in my datastore. I want to perform a simple operation on all Dogs, but I have a lot of dogs.

all_dogs = Dog.all(keys_only=True)
print dogs.count(100000) #returns 79234, or some equally large number

If I simply do this:

for dog_key in all_dogs:
    k = dog_key

which I understand is the same as doing:

for dog_key in all_dogs.run(batch_size=20):
    k = dog_key

then I'll get a datastore timeout exception like this:

Timeout: The datastore operation timed out, or the data was temporarily unavailable.

If I increase the batch_size to 1000, then I'll have no problems.

What causes the operation timer to start and how long does it take to timeout? How can I ensure that I don't get timeouts?

In this case, increasing the batch_size helped, but what if I had millions of Dog entities? How can I ensure I don't get timeouts when performing operations on them?

Upvotes: 0

Views: 145

Answers (1)

voscausa
voscausa

Reputation: 11706

If you have a lot of dogs and you want to proces all the entities:

  • you can use the map/reduce library.

And if you want to program yourself:

  • you can use tasks, they will run for 10 minutes
  • you can chain tasks using a cursor (extending the 10 minutes deadline)
  • you can use a backend
  • you can use batch operations for efficiency
  • you can use key only operations for efficiency
  • you can use asynchronous operation for efficiency
  • you can use projection queries for efficiency

Upvotes: 3

Related Questions