How to process several HTTP requests with Flask

Question

I have a question regarding Flask, Waitress and parallel processing of HTTP requests.

I have read that Flask alone can only process one HTTP request at a time.

In the table below I put all the possible configuration and I would like to have your feedback concerning the number of HTTP requests that I can process in parallel.

|                        |Only Flask| Flask and Waitress|
|-------------------  -- |----------|-------------------|
|1 CPU & 1 core          | 1 request| 1 request         |
|1 CPU & 4 core          | 1 request| 4 request         |
|2 CPU & 1 core each CPU | 1 request| 2 request         |
|2 CPU & 4 core each CPU | 1request | 8 requests        |

I ask these questions because a colleague told me that we can process several thousand HTTP requests with an Apach server with only 1 CPU and 1 core !!

So, how should I handle the maximum number of HTTP requests in parallel?

Saiprasad Balasubramanian · Accepted Answer

Let me clear out the confusion for you.

When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV to production and run, you'll find a warning in the terminal.

Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff

To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.

In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers. You can achieve it in the following ways:

Worker Class - The type of worker to use
No of Workers
No of Threads

The way you calculate the maximum number of concurrent requests is as follows: Taking a 4 core server as

As per the documentation of gunicorn, the optimal number of workers is suggested as (2 * num_of_cores) + 1 which in this case becomes (2*4)+1 = 9

Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores) which in this case comes out to say 4*9 = 36

So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections

Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.

Now coming to Web Servers

Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.

So in a gist, you could use a combination from the list below as per your requirements,

Flask + Application Server + Web Server where, Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc

How to process several HTTP requests with Flask

Answers (1)

Related Questions