What happens to an App Engine request if autoscaling can't create a new instance?

Question

Because of the instance limit. So there is a request, it sits in the queue long enough, but App Engine autoscaling can't start a new instance.

What happens to this request? Is it kept in the queue indefinitely or is it aborted after some time?

Waelmas · Accepted Answer

It returns a message "Rate exceeded." to the user and the following error in the logs "Request was aborted after waiting too long to attempt to service your request."

Here's how I tested it:

I created a class to count the time elapsed to make sure that I am indeed executing multiple concurrent requests. And a basic Python app that has a sleep function for 20 seconds. Then in app.yaml I set the max-instances to 1, and max-concurrent requests to 1. Then by simply opening 5 tabs with the app URL and running them at the same time, at least one of them will fail with the errors mentioned above.

Tested on GAE Standard

timer.py:

import time
class TimerError(Exception):
    """A custom exception used to report errors in use of Timer class"""
class Timer:
    def __init__(self):
        self._start_time = None
    def start(self):
        """Start a new timer"""
        if self._start_time is not None:
            raise TimerError(f"Timer is running. Use .stop() to stop it")
        self._start_time = time.perf_counter()
    def stop(self):
        """Stop the timer, and report the elapsed time"""
        if self._start_time is None:
            raise TimerError(f"Timer is not running. Use .start() to start it")
        elapsed_time = time.perf_counter() - self._start_time
        self._start_time = None
        print(f"Elapsed time: {elapsed_time:0.4f} seconds")

main.py:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello():
    import time
    from timer import Timer
    t = Timer()
    t.start()
    print('Started')
    time.sleep(20)
    t.stop()
    return 'Hello World!'

if __name__ == '__main__':

requirements.txt:

Flask==1.1.2
codetiming

app.yaml:

service: scaling
runtime: python37
instance_class: F1
automatic_scaling:
  target_cpu_utilization: 0.65
  min_instances: 1
  max_instances: 1
  min_pending_latency: 30ms  # default value
  max_pending_latency: automatic
  max_concurrent_requests: 1

Deploy:

gcloud app deploy

Then: Open 5 tabs with the link of the deployed app at the same time.

Results: User gets: "Rate exceeded." GAE logs show: ERROR "Request was aborted after waiting too long to attempt to service your request."

What happens to an App Engine request if autoscaling can't create a new instance?

Answers (1)

Related Questions

What happens to an App Engine request if autoscaling can&#39;t create a new instance?

Answers (1)

Related Questions

What happens to an App Engine request if autoscaling can't create a new instance?