user_12
user_12

Reputation: 2119

How to deploy a scalable API using fastapi?

I have a complex API which takes around 7GB memory when I deploy it using Uvicorn.

I want to understand how I can deploy it, such a way that from my end I want to be able to make parallel requests. The deployed API should be capable of processing two or three requests at same time.

I am using FastAPI with uvicorn and nginx for deployment. Here is my deployed command.

uvicorn --host 0.0.0.0 --port 8888

Can someone provide some clarity on how people achieve that?

Upvotes: 2

Views: 5142

Answers (2)

Nick Green
Nick Green

Reputation: 121

I'm working on something like this using Docker and NGINX.

There's a Docker official image created by the guy who developed FastAPI that deploys uvicorn/gunicorn for you that can be configured to your needs:

It took some time to get the hang of Docker but I'm really liking it now. You can build an nginx image using the below configuration and then build x amount of your app inside of separate containers for however many you need to serve as hosts.

The below example is running a weighted load balancer for two of my app services with a backup third if those two should fail.

https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi

nginx Dockerfile:

    FROM nginx
    
    # Remove the default nginx.conf
    RUN rm /etc/nginx/conf.d/default.conf
    
    # Replace with our own nginx.conf
    COPY nginx.conf /etc/nginx/conf.d/

nginx.conf:

upstream loadbalancer {
    server 192.168.115.5:8080 weight=5;
    server 192.168.115.5:8081;
    server 192.168.115.5:8082 backup;
}

server {
    listen 80;

    location / {
    proxy_pass http://loadbalancer;

}
}

app Dockerfile:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

RUN pip install --upgrade pip

WORKDIR /app

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY . /app

Upvotes: 2

Abel Rodríguez
Abel Rodríguez

Reputation: 652

You can use gunicorn instead of uvicorn to handle your backend. Gunicorn offers multiple workers to effectively make load balancing of the arriving requests. This means that you will have as many gunicorn running process as you specify to receive and process requests. From the doc, gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. However, the number of workers should be no more than (2 x number_of_cpu_cores) + 1 to avoid running out of memory errors. You can check this out in the doc.

For example, if you want to use 4 workers for your fastapi-based backend, you can specify it with the flag w:

gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b "0.0.0.0:8888"

In this case, the script where I have my backend functionalities is called main and fastapi is instantiated as app.

Upvotes: 1

Related Questions