Why are we experiencing huge latency on one autoscaled Google App Engine instance when several others are available?

Question

Our autoscaling parameters in app.yaml are as follows:

automatic_scaling: min_idle_instances: 3 max_idle_instances: automatic max_pending_latency: 30ms max_concurrent_requests: 20

The result is 3 resident instances and typically 2-6 dynamic instances (depending on traffic), but the load distribution among the instances seems inefficient. In the screenshot below we see 1 instance with the vast majority of requests, and a massive 21s latency (in last minute).

To me this indicates there must be something wrong with our setup to explain these high latencies.

Has anyone experienced issues like this with GCP or App Engine?

konqi · Accepted Answer

Idle instances aren't used to balance current load. They bridge the gap while new dynamic instances are spinning up. In your setup it might be worth trying just one or two idle instances and fiddle with min and max pending latency.

Pending latency is measured by how long a request stays in the queue before it is handled by an instance. The latency you see in your screenshot is the time between request and response. If any single request takes 21 seconds it would look like this. The pending latency could still be below 30ms though.

You should check your logs and see which request takes so long and probably break them up into smaller chunks of work. Many small jobs scale much better than huge jobs. Pending latency will also go up with lots of small jobs and will cause your app to scale properly.

Why are we experiencing huge latency on one autoscaled Google App Engine instance when several others are available?

Answers (1)

Related Questions