Reputation: 31
I have a sample FastAPI application, with a single endpoint which returns the result a after a delay of 5 seconds. Please find the code below.
from fastapi import FastAPI
import uvicorn, os, time
app = FastAPI()
@app.get("/delayed-response")
async def read_root():
time.sleep(5)
return {"message": f"This is a delayed response!- {os.getpid()}"}
if __name__ == "__main__":
uvicorn.run(
"main:app",
host='127.0.0.1',
port=9090,
reload=False,
workers=10, #Just having 10 workers to understand the concept.
)
Now I also have a script that will be sending parallel requests to the endpoint http://localhost:9090/delayed-response
Upon starting the Application I came to know that all 10 workers with process id from 9 to 18 are started successfully.
When I send 20 parallel request to this endpoint I observe that only first few requests are handled parallel by the workers and later only one worker is handling all the remaining requests. (May be from a certain point).
Attaching few screenshots of the response:-
Can anyone explain this behavior?
Note: Let us consider this application to be a synchronous application. I understand that there are few concepts to make is work in a concurrent way, but at this point I wish to understand this working.
My Expectation : My expectation is that when 10 workers are configured and 30 requests are sent in parallel, the 30th request to an endpoint with 5s delay should get the response at ~15th to 17th second.
Upvotes: 1
Views: 4095
Reputation: 41
Note that you're mixing an async handler (async def read_root), with a blocking (synchronous) call to time sleep.
Fastapi can handle blocking requests via its internal threadpool, but to do so, you need to define a regular handler (def read_root).
If you have an async handler, you should use: await asyncio.sleep(5) instead
See here: https://docs.python.org/3/library/asyncio-task.html#task-groups And here: https://fastapi.tiangolo.com/async/#in-a-hurry
Upvotes: 1
Reputation: 11
When lots of requests come in faster than the workers can handle, some requests might have to wait in line. This waiting can happen because of things like not enough resources or too many requests trying to use the same resources at once. In your situation, since your app processes requests one after the other, this waiting might be happening because each request takes time to finish before the next one can start. So, if too many requests come in at once, they'll start stacking up, and eventually, only one worker will be left to handle them all.
Upvotes: 1