Reputation: 753
As we were doing a stress testing of a Rasa Agen before deploying on the production, we encountered that it only supports 23 requests per second with a response time of 1 second.
If we try to increase the requests by more than 23, then the response time increases gradually; it becomes more than 5 seconds, regardless of its hardware.
Is there any way to eliminate this limit?
I am on Rasa version 2.1.2
Upvotes: 3
Views: 500
Reputation: 216
We have done a similar kind of exercise for kAIron, which internally uses rasa 2.1.2 and not able to have more than 23 req/seq; even with our chat server implementation using tornado, which still under development, we can reach 32 req/seq max with response time up to 1 sec.
In our experience, to achieve more concurrency, one option is to deploy the rasa chat server on Kubernetes with horizontal scaling
Upvotes: 3