Does App Engine Flexible for Python support concurrent requests?

Question

From the documentation on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.

Let's say my application can process a single request every 60 seconds.

After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold.

I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold.

The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure.

Alex · Accepted Answer

Perhaps 'number of workers' is the config setting you're looking for:

https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration

Gunicorn uses workers to handle requests. By default, Gunicorn uses sync workers. This worker class is compatible with all web applications, but each worker can only handle one request at a time. By default, gunicorn only uses one of these workers. This can often cause your instances to be underutilized and increase latency in applications under high load.

And it sounds like you've already seen that you can specify the cpu utilization threshold:

https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#automatic_scaling

You can also use something other than gunicorn if you prefer. Here's one of their example's where they use Honcho instead:

https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/6-pubsub/app.yaml

Does App Engine Flexible for Python support concurrent requests?

Answers (1)

Related Questions