gowerc
gowerc

Reputation: 1099

Advice for how to manage python modules in docker?

I am after advice on how to manage python modules within the context of docker.

Current options that I'm aware of include:

  1. Installing them individually via pip in the build process
  2. Installing them together via pip in the build process via requirments.txt
  3. Installing them to a volume and adding the volume to the PYTHONPATH

Ideally I want a solution that is fully re-producible and that doesn't require every module to be re-installed if I decide to add another module or update the version of one of them.

From my perspective:
(2) is an issue because the docker ADD command (to get access to the requirements.txt file) apparently invalidates the cache and means that any changes to the file means everything has to be re-built / re-installed everytime you build the image.
(1) keeps the cache intact but means you'd need to specify the exact version for each package (and potentially their dependencies?) which seems like it could be pretty tedious and error prone.
(3) is currently my personal favorite as it allows the packages to persist between images/builds and allows for requirements.txt to be used. Only downside is that essentially you are storing the packages on your local machine rather than the image which leads to the container being dependent on the host OS which kind-of defeats the point of a container.

So yer I'm not entirely sure what best practices are here and would appreciate advice.

For reference there have been other questions on this topic but I don't feel any of them properly address my above question:
docker with modified python modules?
Docker compose installing requirements.txt
How can I install python modules in a docker image?

EDIT:
Just some additional notes to give some more context. My projects are typically data analysis focused (rather than software development or web development). I tend to use multiple images (1 for python, 1 for R, 1 for the database) using docker compose to manage them all together. So far I've been using a makefile on the host OS to re-build the project from scratch i.e. something like

some_output.pdf:  some_input.py
    docker-compose run python_container python some_input.py

where the outputs are written to a volume on the host OS

Upvotes: 2

Views: 3519

Answers (2)

sxm1972
sxm1972

Reputation: 752

Another option is to use multi-stage build feature. Create an intermediate build that installs the dependencies and then just copy the folder to the production image (second build stage). This gives you the benefit of your option 3 as well.

It depends on which step in your build is more expensive and would benefit from caching. Compare the following:

Dockerfile A

FROM Ubuntu:16.04

Install Python, Pip etc. Add requirements.txt Run pip install

Run my build steps which are expensive.

Dockerfile B

FROM Ubuntu:16.04 AS intermediate Install Python, Pip etc. Add requirements.txt Run pip install

FROM Ubuntu:16.04

Run my build steps which are expensive.

COPY --from=intermediate /pip-packages/ /pip-packages/

In the first case touching your requirements.txt will force a full build. In the second case, your expensive build steps are still cached. The intermediate build still runs but I assume that is not the expensive step here.

Upvotes: 0

David Maze
David Maze

Reputation: 159592

The requirements.txt file is the best option. (Even if changing it does a complete reinstall.)

A new developer starts on your project. They check out your source control repository and say, "oh, it's a Python project!", create a virtual environment, and run pip install -r requirements.txt, and they're set to go. A week later they come by and say "so how do we deploy this?", but since you've wrapped the normal Python setup in Docker they don't have to go out of their way to use a weird Docker-specific development process.

Disaster! Your primary server's hard disk has crashed! You have backups of all of your data, but the application code just gets rebuilt from source control. If you're keeping code in a Docker volume (or a bind-mounted host directory) you need to figure out how to rebuild it; but your first two options have that written down in the Dockerfile. This is also important for the new developer in the previous paragraph (who wants to test their image locally before deploying it) and any sort of cluster-based deployment system (Swarm, Kubernetes) where you'd like to just deploy an image and not also have to deploy the code alongside it, by hand, outside of the deployment system framework.

Upvotes: 3

Related Questions