How to keep dependencies consistent between dev and production env for a python package with poetry and pyproject.toml

Question

I have a python package where the dependencies is specified in a poetry.lock file for development and testing. I then build and publish the package, which gets installed on the production docker image. But here is the problem: the published package has its dependency specified in the tool.poetry.dependencies section of pyproject.toml, which could be different from poetry.lock. So it's possible that the production env ends up with dependencies different from testing env.

I can think of several ways to achieve consistency, but none of them seems that good to me:

Use the same set versions in pyproject.toml as in poetry.lock. This will guarantee the published package has the same dependencies as dev/test. But what's even the point of keeping a poetry.lock file at this point, since pyproject.toml can be used for poetry install as well if there is no poetry.lock file. I think this works, but then I don't understand why even have poetry.lock in the first place.
In the production docker image, checkout the poetry.lock file from package repo and run poetry install before installing the package itself. But this will increase the docker image size, introduce unnecessary config if the repo is private, and overall doesn't seem natural.

I'm pretty new to this part of Python so maybe one of these is the "standard" workflow. Or maybe I'm just completely missing something. Thanks for answering!

Arne · Accepted Answer

Option 1: Nailing down your dependency versions as you describe in option 1 is not advisable, since it leads to an unnecessarily strict package. This will often lead to avoidable conflicts, especially if the package you're writing is an internal dependency to other projects as well.

Option 2: Handling dependencies like this is definitely better than option 1, but harder to maintain than the option I want to propose. As a side note, it also requires poetry to be installed on your docker image - you only really need pip if all you want to do is installing packages.

Option 3: Create a wheelhouse at the beginning of your build-pipeline, and use it in subsequent steps to install runtime dependencies. This ensures that there is no possible discrepancy between tested code and deployed code, and it's really fast because there is no downloading from the internet or buidling of source-only distributions. I'll use a sample .gitlab-ci.yml to show what I mean, but the concept should translate without too many issues into every other CI/CD:

.gitlab-ci.yml

image: acaratti/pypoet
# a simple python:slim image that comes with poetry preinstalled

stages:
  - build
  - test
  - release

variables:
  WHEELHOUSE: wheelhouse 
  POETRY_VIRTUALENVS_PATH: venv  # speeds up jobs
  IMAGE_NAME: my-app

wheels:
  stage: build
  script:
    - poetry install
    - poetry build -f wheel
    - poetry export -f requirements.txt -o requirements.txt
    - poetry run pip wheel -w ${WHEELHOUSE} -r requirements.txt
    - mv dist/* ${WHEELHOUSE}
  artifacts:
    expire_in: 1 week
    paths:
      - ${WHEELHOUSE}
      - ${POETRY_VIRTUALENVS_PATH}

pytest:
  stage: test
  script:
    # no need to run `poetry install` because 
    # the venv from the build-job gets re-used
    - poetry run pytest

dockerize:
  stage: release
  image: docker:git
  script:
    - docker build . -t ${IMAGE_NAME}
    - docker push ${IMAGE_NAME}

If you have such a wheelhouse available during dockerization, the Dockerfile itself is often enough as simple as this:

Dockerfile

FROM python:3.9-slim

COPY wheelhouse/* wheelhouse/

RUN pip install wheelhouse/*

ENTRYPOINT ["run", "my", "app"]

Caveats

The image that you use for the wheel job needs to be the same arch as the base image in your docker file (or any job that tries to install the wheelhouse or re-use the virtual envrionment) - if you use debian for your gitlab jobs but alpine in the prod-image, things will break apart real quick.

This also extends to building the image locally, if that's something you'd want to do during development. If your workstation has a different arch, e.g. ubuntu, you might not be able to do that any more. Here is a recipe that creates a working debian-based wheelhouse on your workstation.

How to keep dependencies consistent between dev and production env for a python package with poetry and pyproject.toml

Answers (1)

Related Questions