Reputation: 327
I have a question regarding Flask, Waitress and parallel processing of HTTP requests.
I have read that Flask alone can only process one HTTP request at a time.
In the table below I put all the possible configuration and I would like to have your feedback concerning the number of HTTP requests that I can process in parallel.
| |Only Flask| Flask and Waitress|
|------------------- -- |----------|-------------------|
|1 CPU & 1 core | 1 request| 1 request |
|1 CPU & 4 core | 1 request| 4 request |
|2 CPU & 1 core each CPU | 1 request| 2 request |
|2 CPU & 4 core each CPU | 1request | 8 requests |
I ask these questions because a colleague told me that we can process several thousand HTTP requests with an Apach server with only 1 CPU and 1 core !!
So, how should I handle the maximum number of HTTP requests in parallel?
Upvotes: 1
Views: 1965
Reputation: 914
Let me clear out the confusion for you.
When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production
and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV
to production and run, you'll find a warning in the terminal.
Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff
To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.
In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers. You can achieve it in the following ways:
The way you calculate the maximum number of concurrent requests is as follows: Taking a 4 core server as
As per the documentation of gunicorn, the optimal number of workers
is suggested as (2 * num_of_cores) + 1
which in this case becomes (2*4)+1 = 9
Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores)
which in this case comes out to say 4*9 = 36
So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections
Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.
Now coming to Web Servers
Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.
So in a gist, you could use a combination from the list below as per your requirements,
Flask + Application Server + Web Server where, Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc
Upvotes: 5