Gunicorn kills workers while using oracledb + sqlalchemy

Question

I had my FastAPI application running for longer period of time, but recently I've added SQLAlchemy to connect to Oracle database. While I'm running app locally everything is working fine, but as soon as I try to run it on Gunicorn using Docker + K8s I get issues with Gunicorn workers. It seems that some of the workers are getting killed all the time, while some are running serving requests. Sometimes it comes to the point where whole image is killed and restarted. Only changes made between working and this problematic versions of application are adding SQLAlchemy and oracledb library. Gunicorn config stayed the same, dockerfile is the same as well.

Here is my dockerfile:

FROM python:3.10

EXPOSE 8000

WORKDIR /app
RUN mkdir -p /app
COPY . /app

RUN pip install poetry
RUN poetry config virtualenvs.create false
RUN poetry install

COPY certs/ /usr/local/share/ca-certificates
RUN update-ca-certificates

ENV REQUESTS_CA_BUNDLE "/etc/ssl/certs/ca-certificates.crt"

CMD [ "gunicorn", "wsgi:app", "--bind", "0.0.0.0:8000", "-w 4", "--access-logfile", "-", "--log-level", "debug", "-k uvicorn.workers.UvicornWorker", "-t 90"]

this is how I create engine:

from sqlalchemy import Engine, create_engine

import config


def get_engine() -> Engine:
    connection_url = f"oracle+oracledb://{config.DB_USERNAME}:{config.DB_PASSWORD}@{config.DB_HOST}:{config.DB_PORT}/?service_name={config.DB_SERVICE_NAME}"

    engine = create_engine(connection_url)
    return engine

so it seems that even if there would be any problem with SqlAlchemy or DB connection I shouldn't get any issues before I actually try to use this engine, and engine is used only in endpoints through repository... My thoughts are that maybe oracledb library is causing issues, but cannot find any related information to that and I'm not getting any significant messages from gunicorn logs:

[2023-02-13 00:02:27 +0000] [1] [WARNING] Worker with pid 203 was terminated due to signal 9
[2023-02-13 00:02:27 +0000] [207] [INFO] Booting worker with pid: 207
[2023-02-13 00:02:35 +0000] [205] [INFO] Started server process [205]
[2023-02-13 00:02:35 +0000] [205] [INFO] Waiting for application startup.
[2023-02-13 00:02:35 +0000] [205] [INFO] Application startup complete.
some-ip:some-port - "GET /status HTTP/1.1" 200
some-ip:some-port - "GET /status HTTP/1.1" 200
[2023-02-13 00:02:38 +0000] [1] [WARNING] Worker with pid 197 was terminated due to signal 9
[2023-02-13 00:02:38 +0000] [209] [INFO] Booting worker with pid: 209
[2023-02-13 00:02:45 +0000] [207] [INFO] Started server process [207]
[2023-02-13 00:02:45 +0000] [207] [INFO] Waiting for application startup.
[2023-02-13 00:02:45 +0000] [207] [INFO] Application startup complete.
some-ip:some-port - "GET /status HTTP/1.1" 200
some-ip:some-port - "GET /status HTTP/1.1" 200
[2023-02-13 00:02:47 +0000] [1] [WARNING] Worker with pid 205 was terminated due to signal 9
[2023-02-13 00:02:47 +0000] [212] [INFO] Booting worker with pid: 212
[2023-02-13 00:02:55 +0000] [209] [INFO] Started server process [209]
[2023-02-13 00:02:55 +0000] [209] [INFO] Waiting for application startup.
[2023-02-13 00:02:55 +0000] [209] [INFO] Application startup complete.
some-ip:some-port - "GET /status HTTP/1.1" 200
some-ip:some-port - "GET /status HTTP/1.1" 200
[2023-02-13 00:02:57 +0000] [1] [WARNING] Worker with pid 207 was terminated due to signal 9
[2023-02-13 00:02:57 +0000] [214] [INFO] Booting worker with pid: 214
[2023-02-13 00:03:04 +0000] [212] [INFO] Started server process [212]
[2023-02-13 00:03:04 +0000] [212] [INFO] Waiting for application startup.
[2023-02-13 00:03:04 +0000] [212] [INFO] Application startup complete.
some-ip:some-port - "GET /status HTTP/1.1" 200
[2023-02-13 00:03:06 +0000] [1] [WARNING] Worker with pid 209 was terminated due to signal 9
[2023-02-13 00:03:06 +0000] [217] [INFO] Booting worker with pid: 217
[2023-02-13 00:03:14 +0000] [214] [INFO] Started server process [214]
[2023-02-13 00:03:14 +0000] [214] [INFO] Waiting for application startup.
[2023-02-13 00:03:14 +0000] [214] [INFO] Application startup complete.
some-ip:some-port- "GET /status HTTP/1.1" 200
some-ip:some-port - "GET /status HTTP/1.1" 200
[2023-02-13 00:03:17 +0000] [1] [WARNING] Worker with pid 212 was terminated due to signal 9
[2023-02-13 00:03:17 +0000] [219] [INFO] Booting worker with pid: 219
[2023-02-13 00:03:25 +0000] [217] [INFO] Started server process [217]
[2023-02-13 00:03:25 +0000] [217] [INFO] Waiting for application startup.
[2023-02-13 00:03:25 +0000] [217] [INFO] Application startup complete.
some-ip:some-port - "GET /status HTTP/1.1" 200
some-ip:some-port - "GET /status HTTP/1.1" 200

Before my docker image was getting killed after ~1 min because it was still in unready status, so I've changed gunicorn timeout from default 30 seconds to 90 seconds - it helped a bit because now some workers are running and some are being killed, but application is able to handle any requests at all. I've also increased logging level to debug, but there isn't any helpful information there.

Any ideas?

Gunicorn kills workers while using oracledb + sqlalchemy

Answers (1)

Related Questions