FastAPI: Some requests are failing due to 10s timeout

Question

We have deployed a model prediction service in production that is using FastAPI and unfortunately, some of the requests are failing due to a 10s timeout. In terms of concurrent requests, we typically only load about 2/3 requests per second, so I wouldn't think that would be too much strain on FastAPI. The first thing we tried to do is isolate the FastAPI framework from the model itself, and when we performed some tracing, we noticed that a lot of time (6 seconds) was spent on this segment: starlette.exceptions:ExceptionMiddleware.__call__.

The gunicorn configuration we are using didn't seem to help either:

"""gunicorn server configuration."""
import os

threads = 2
workers = 4
timeout = 60
keepalive = 1800
graceful_timeout = 1800
bind = f":{os.environ.get('PORT', '80')}"
worker_class = "uvicorn.workers.UvicornWorker"

Would really appreciate some guidance on what the above segment implies and what is causing timeout issues for some requests under a not too strenuous load.

Bastien B · Accepted Answer

guidance on what the above segment implies

here you have the official gunicorn config file with lot of explainations included.

since you use gunicorn to manager uvicorn workers, forcing your timeout to 60 sec should work just fine for lnog running tasks (you should think about using a asynchronous task queue or job queue like celery)

but what is returning your route ? first thing would be to see the error thrown by your api

starlette.exceptions:ExceptionMiddleware.call

Since you have expanded the list you can see that what take the most time (as expected) is not fastapi nor starlette but you function in app.api.routes.predictions.

so I wouldn't think that would be too much strain on FastAPI

it is not too much strain of fastapi since it is not involved in your request treatment. Remember that fastapi is "just" a framework so when your function take time it's your function/development that's at fault.

here it can be one or a combinaison of thoses things that cause long runing tasks:

sync route
blocking I/O function or treatment in your route function
prediction algo that take a lot of time (too much maybe)
bad worker class configuration for your type of treatment

When you do AI or nlp stuff often if take a lot of treatment time, regarding the integration of such models in api you use a task queue like celery. If your api is not at fault and your route not returning an error, just that it take a lot of time you should take a look at implementing task queue's.

FastAPI: Some requests are failing due to 10s timeout

Answers (1)

Related Questions