Reputation: 1628
I have the following code in FastAPI route handler. client
is aiohttp.ClientSession()
. The service is singleton, meaning all use the same class where I have this client.
async def handler():
log...
async with client.post(
f"{config.TTS_SERVER_ENDPOINT}/v2/models/{self.MODEL_NAME}/infer",
json=request_payload,
) as response:
response_data = await response.json() # ✅ Get JSON response
log...
I am load testing the system and getting in logs and in jmeter results that I am only handling 2-3 requests per second - is that reasonable?
I would expect to see a lot of messages "start" and then a lot of "finish" messages, but this is not the case.
I see that the interval between start and finish logs are getting larger and larger, starting from 0.5 seconds to 5-6 seconds - what could be the bottleneck here?
I am running FastAPI in Docker with one CPU, 2G memory and with this command :
CMD ["uv", "run", "gunicorn", "-k", "uvicorn.workers.UvicornWorker", "-w", "4", "--threads", "16","--worker-connections", "2000", "-b", "0.0.0.0:8000","--preload", "src.main:app"]
where uv
is the package manager I am using.
What is going on here? It does not seem reasonable for me that I am only handling such amount of requests.
Upvotes: 0
Views: 34