How to run RASA Server in multi thread mode using gunicorn

Question

The Rasa server is running fine on the server using single thread. python -m rasa_nlu.server --path projects --emulate dialogflow --response_log logs

I am trying to run it on the server and would like to enable multi-threading. As per RASA documentation,(https://nlu.rasa.com/0.8.12/http.html) I am trying to run below command.

gunicorn -w 4 --threads 12 -k gevent -b 127.0.0.1:5000 rasa_nlu.wsgi

This gives me below error.

Please suggest.

Caleb Keller · Accepted Answer

This is no longer possible, the Rasa documentation you are pointing to is for version 0.8 they are now on version 0.12. There are several factors contributing to why support for this was removed, primarily:

High memory usage for language models
Move from Flask to Klein for asynchronous training

Here's a Github issue with some more information: https://github.com/RasaHQ/rasa_nlu/issues/793

If you were going for higher overall throughput of /parse requests then the recommendation is to use Docker combined with nginx to run multiple instances on the same server - if the server is large enough to handle it - or run multiple smaller instances, still with an nginx reverse proxy.

Note that training has already been moved into separate processes. The number of processes available for training can be set with the --max_training_processes argument. Also some components of the Rasa pipeline support multiple threads. The number of threads available to these pipeline components can be set with the --num_threads argument.

How to run RASA Server in multi thread mode using gunicorn

Answers (1)

Related Questions