Reputation: 7586
Because of the instance limit. So there is a request, it sits in the queue long enough, but App Engine autoscaling can't start a new instance.
What happens to this request? Is it kept in the queue indefinitely or is it aborted after some time?
Upvotes: 2
Views: 95
Reputation: 1962
It returns a message "Rate exceeded." to the user and the following error in the logs "Request was aborted after waiting too long to attempt to service your request."
Here's how I tested it:
I created a class to count the time elapsed to make sure that I am indeed executing multiple concurrent requests. And a basic Python app that has a sleep function for 20 seconds. Then in app.yaml I set the max-instances to 1, and max-concurrent requests to 1. Then by simply opening 5 tabs with the app URL and running them at the same time, at least one of them will fail with the errors mentioned above.
Tested on GAE Standard
timer.py:
import time
class TimerError(Exception):
"""A custom exception used to report errors in use of Timer class"""
class Timer:
def __init__(self):
self._start_time = None
def start(self):
"""Start a new timer"""
if self._start_time is not None:
raise TimerError(f"Timer is running. Use .stop() to stop it")
self._start_time = time.perf_counter()
def stop(self):
"""Stop the timer, and report the elapsed time"""
if self._start_time is None:
raise TimerError(f"Timer is not running. Use .start() to start it")
elapsed_time = time.perf_counter() - self._start_time
self._start_time = None
print(f"Elapsed time: {elapsed_time:0.4f} seconds")
main.py:
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
import time
from timer import Timer
t = Timer()
t.start()
print('Started')
time.sleep(20)
t.stop()
return 'Hello World!'
if __name__ == '__main__':
requirements.txt:
Flask==1.1.2
codetiming
app.yaml:
service: scaling
runtime: python37
instance_class: F1
automatic_scaling:
target_cpu_utilization: 0.65
min_instances: 1
max_instances: 1
min_pending_latency: 30ms # default value
max_pending_latency: automatic
max_concurrent_requests: 1
Deploy:
gcloud app deploy
Then: Open 5 tabs with the link of the deployed app at the same time.
Results: User gets: "Rate exceeded." GAE logs show: ERROR "Request was aborted after waiting too long to attempt to service your request."
Upvotes: 2