Charlie Parker
Charlie Parker

Reputation: 5189

How to use a python library that is constantly changing in a docker image or new container?

I organize my code in a python package (usually in a virtual environment like virtualenv and/or conda) and then usually call:

python <path_to/my_project/setup.py> develop

or

pip install -e <path_to/my_project/setup.py>

so that I can use the most recent version of my code. Since I develop mostly statistical or machine learning algorithms, I prototype a lot and change my code daily. However, recently the recommended way to run our experiments on the clusters I have access is through docker. I learned about docker and I think I have a rough idea of how to make it work but wanted wasn't quite sure if my solutions was good or if there might be better solutions out there.

The first solution that I thought is having a solution that copied the data in my docker image with:

COPY /path_to/my_project
pip install /path_to/my_project

and then pip installing it. The issue with this solution is that I have to actually build a new image each time which seems silly and was hoping I could have something better. To do this I was thinking of having a bash file like:

#BASH FILE TO BUILD AND REBUILD MY STUFF
# build the image with the newest version of 
# my project code and it pip installs it and its depedencies
docker build -t image_name .
docker run --rm image_name python run_ML_experiment_file.py 
docker kill current_container #not sure how to do get id of container
docker rmi image_name

as I said, my intuition tells me this is silly so I was hoping there was a single command way to do this with Docker or with a single Dockerfile. Also, note the command should use -v ~/data/:/data to be able to get the data and some other volume/mount to write to (in the host) when it finishes training.

Another solution that I thought was to have all the python dependencies or other dependencies that my library needs in the Dockerfile (and hence in the image) and then somehow executing in the running container the installation of my library. Maybe with docker exec [OPTIONS] CONTAINER COMMAND as:

docker exec CONTAINER pip install /path_to/my_project

in the running container. After that then I could run the real experiment I want to run with the same exec command:

docker exec CONTAINER python run_ML_experiment_file.py

though, I still don't know how to systematically get the container id though (because I probably don't want to look up the container id every time I do this).

Ideally in my head the best conceptual solution would be to simply have the Dockerfile know from the beginning to which file it should mount to (i.e. /path_to/my_project) and then somehow do python [/path_to/my_project] develop inside the image so that it would always be linked to the potentially changing python package/project. That way I can run my experiments with a single docker command as in:

docker run --rm -v ~/data/:/data python run_ML_experiment_file.py

and not have to explicitly update the image myself every time (that includes not having to re install parts of the image that should be static) since its always in sync with the real library. Also, having some other script build a new image from scratch each time is not what I am looking for. Also, It would be nice to be able to avoid writing any bash too if possible.


I think I am very close to a good solution. What I will do instead of building a new image each time I will simply run the CMD command to do python develop as follow:

# install my library (only when the a container is spun)
CMD python ~/my_tf_proj/setup.py develop

the advantage is that it will only pip install my library whenever I run a new container. This solves the development issue because re creating a new image takes to long. Though I just realized that if I use the CMD command then I can't run other commands given to my docker run, so I actually mean to run ENTRYPOINT .

Right now the only issue to complete this is that I am having issues using volume because I can't successfully link to my host project library within the Dockerfile (which seems to require an absolute path for some reason). I am currently doing doing (which doesn't seem to work):

VOLUME /absolute_path_to/my_tf_proj /my_tf_proj

why can't I link using the VOLUME command in my Dockerfile? My main intention with using VOLUME is making my library (and other files that are always needed by this image) accessible when the CMD command tries to install my library. Is it possible to just have my library available all the time when a container is initiated?

Ideally I wanted to just have the library be installed automatically when a container is run and if possible, since the most recent version of the library is always required, have it install when a container is initialized.

As a reference right now my non-working Dockerfile looks as follow:

# This means you derive your docker image from the tensorflow docker image
# FROM gcr.io/tensorflow/tensorflow:latest-devel-gpu
FROM gcr.io/tensorflow/tensorflow
#FROM python
FROM ubuntu

RUN mkdir ~/my_tf_proj/
# mounts my tensorflow lib/proj from host to the container
VOLUME /absolute_path_to/my_tf_proj

#
RUN apt-get update

#
apt-get install vim

#
RUN apt-get install -qy python3
RUN apt-get install -qy python3-pip
RUN pip3 install --upgrade pip

#RUN apt-get install -y python python-dev python-distribute python-pip

# have the dependecies for my tensorflow library
RUN pip3 install numpy
RUN pip3 install keras
RUN pip3 install namespaces
RUN pip3 install pdb

# install my library (only when the a container is spun)
#CMD python ~/my_tf_proj/setup.py develop
ENTRYPOINT python ~/my_tf_proj/setup.py develop

As a side remark:

Also, for some reason it requires me to do RUN apt-get update to be able to even install pip or vim in my container. Do people know why? I wanted to do this because just in case I wanted to attach to the container with a bash terminal, it would be really helpful.

Seems that Docker just forces you to apt install to always have the most recent version of software in the container?


Bounty:

what a solution with COPY? and perhaps docker build -f path/Docker .. See: How does one build a docker image from the home user directory?

Upvotes: 7

Views: 3770

Answers (5)

I usually use Dockerfile

###########
# BUILDER #
###########

# pull official base image
FROM python:3.8.3-slim as builder

# set work directory
WORKDIR /usr/src/app

# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# install psycopg2 dependencies
RUN apt-get update \
    && apt-get -y install libpq-dev gcc \
    python3-dev musl-dev libffi-dev\
    && pip install psycopg2

# lint
RUN pip install --upgrade pip
COPY . .

# install dependencies
COPY ./requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /usr/src/app/wheels -r requirements.txt

# copy project
COPY . .

#########
# FINAL #
#########

# pull official base image
FROM python:3.8.3-slim

# create directory for the app user
RUN mkdir -p /home/app

# create the app user
RUN addgroup --system app && adduser --system --group app

# create the appropriate directories
ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN mkdir $APP_HOME
RUN mkdir $APP_HOME/static
RUN mkdir $APP_HOME/media
RUN mkdir $APP_HOME/currencies
WORKDIR $APP_HOME

# install dependencies
RUN apt-get update && apt-get install -y libpq-dev bash netcat rabbitmq-server
COPY --from=builder /usr/src/app/wheels /wheels
COPY --from=builder /usr/src/app/requirements.txt .
COPY wait-for /bin/wait-for
COPY /log /var/log
COPY /run /var/run

RUN pip install --no-cache /wheels/*

# copy project
COPY . $APP_HOME

# chown all the files to the app user
RUN chown -R app:app $APP_HOME
RUN chown -R app:app /var/log/
RUN chown -R app:app /var/run/

EXPOSE 3000

# change to the app user
USER app

# only for dgango
CMD ["gunicorn", "Config.asgi:application", "--bind", "0.0.0.0:8000", "--workers", "3", "-k","uvicorn.workers.UvicornWorker","--log-file","-"]

docker-compose.yml

# docker-compose.yml

version: "3.7"

services:
  db:
    container_name: postgres
    hostname: postgres
    image: postgres:12
    volumes:
      - postgres_data:/var/lib/postgresql/data/
    env_file:
      - .env.prod.db
    networks:
      - main
    restart: always

  pgbackups:
    container_name: pgbackups
    hostname: pgbackups
    image: prodrigestivill/postgres-backup-local
    restart: always
    user: postgres:postgres # Optional: see below
    volumes:
      - ./backups:/backups
    links:
      - db
    depends_on:
      - db
    env_file:
      .env.prod.db
    networks:
      - main

  web:
    build: .
    container_name: web
    expose:
      - 8000
    command: sh -c "wait-for db:5432\
      && python manage.py makemigrations&&python manage.py migrate&&gunicorn Config.asgi:application --bind 0.0.0.0:8000 -w 3 -k uvicorn.workers.UvicornWorker --log-file -"
    volumes:
      - static_volume:/home/app/web/static
      - media_volume:/home/app/web/media
    env_file:
      - .env.prod
    hostname: web
    image: web-image
    networks:
      - main
    depends_on:
      - db
    restart: always

  prometheus:
    container_name: prometheus
    image: prom/prometheus
    hostname: prometheus
    volumes:
      - ./prometheus/:/etc/prometheus/
    ports:
      - 9090:9090
    networks:
      - main
    depends_on:
      - web
    restart: always

  grafana:
    container_name: grafana
    image: grafana/grafana:6.5.2
    hostname: grafana
    ports:
      - 3060:3000
    networks:
      - main
    depends_on:
      - prometheus
    restart: always

  nginx:
    container_name: nginx
    image: nginx:alpine
    hostname: nginx
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/conf.d/default.conf
      - ./wait-for:/bin/wait-for
      - static_volume:/home/app/web/static
      - media_volume:/home/app/web/media
    ports:
      - 80:80
    depends_on:
      - web
    networks:
      - main
    restart: always

networks:
  main:
    driver: bridge

volumes:
  static_volume:
  media_volume:
  postgres_data:

wait-for

#!/bin/sh

TIMEOUT=120
QUIET=0

echoerr() {
  if [ "$QUIET" -ne 1 ]; then printf "%s\n" "$*" 1>&2; fi
}

usage() {
  exitcode="$1"
  cat << USAGE >&2
Usage:
  $cmdname host:port [-t timeout] [-- command args]
  -q | --quiet                        Do not output any status messages
  -t TIMEOUT | --timeout=timeout      Timeout in seconds, zero for no timeout
  -- COMMAND ARGS                     Execute command with args after the test finishes
USAGE
  exit "$exitcode"
}

wait_for() {
  for i in `seq $TIMEOUT` ; do
    nc -z "$HOST" "$PORT" > /dev/null 2>&1

    result=$?
    if [ $result -eq 0 ] ; then
      if [ $# -gt 0 ] ; then
        exec "$@"
      fi
      exit 0
    fi
    sleep 1
  done
  echo "Operation timed out" >&2
  exit 1
}

while [ $# -gt 0 ]
do
  case "$1" in
    *:* )
    HOST=$(printf "%s\n" "$1"| cut -d : -f 1)
    PORT=$(printf "%s\n" "$1"| cut -d : -f 2)
    shift 1
    ;;
    -q | --quiet)
    QUIET=1
    shift 1
    ;;
    -t)
    TIMEOUT="$2"
    if [ "$TIMEOUT" = "" ]; then break; fi
    shift 2
    ;;
    --timeout=*)
    TIMEOUT="${1#*=}"
    shift 1
    ;;
    --)
    shift
    break
    ;;
    --help)
    usage 0
    ;;
    *)
    echoerr "Unknown argument: $1"
    usage 1
    ;;
  esac
done

if [ "$HOST" = "" -o "$PORT" = "" ]; then
  echoerr "Error: you need to provide a host and port to test."
  usage 2
fi

wait_for "$@"

nginx/nginx.conf

# nginx.conf

upstream back {
    server web:8000;
}
server {

    listen 80;

    location / {
        proxy_pass http://back;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;
    }

    location /static/ {
     root /home/app/web/;
    }

    location /media/ {
     root /home/app/web/;
    }

}

prometheus/prometheus.yml

global:
  scrape_interval: 10s
  evaluation_interval: 10s

  external_labels:
    monitor: django-monitor

scrape_configs:
  - job_name: "main-django"
    metrics_path: /metrics
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
        - host.docker.internal

  - job_name: 'prometheus'
    scrape_interval: 10s
    static_configs:
      - targets: [ 'host.docker.internal:9090' ]

.env.prod is unique for your project

.env.prod.db

POSTGRES_USER=
POSTGRES_PASSWORD=
POSTGRES_HOST=
POSTGRES_EXTRA_OPTS="-Z6 --schema=public --blobs"
SCHEDULE="@every 0h30m00s"
BACKUP_KEEP_DAYS=7
BACKUP_KEEP_WEEKS=4
BACKUP_KEEP_MONTHS=6
HEALTHCHECK_PORT=8080

Project run

docker build -t web-image .
docker-compose up

Project update

docker-compose up -d --build
docker-compose up

Run script

docker-compose exec web {script} 

Swarm

docker swarm init --advertise-addr 127.0.0.1:2377
docker stack deploy -c docker-compose.yml  proj

Remove swarm

docker stack rm proj
docker swarm leave --force

Upvotes: 0

Andrey Borisovich
Andrey Borisovich

Reputation: 241

I think you are looking for bind mounts Docker feature. Check this docs: Use Bind Mounts . Using this you may just mount the host directory with your constantly changing python scripts and it will be available in the container. If you need to mount only some specific directory with the constantly changing scripts I would additionally make use of PIP command pip install -r requirements.txt and combine all your packages into the single requirements.txt file (as I see you repeat RUN pip3 install ... in your Dockerfile).

Upvotes: 0

ascendants
ascendants

Reputation: 2381

This may be reiterating some content from other good answers here, but here is my take. To clarify what I think your goals are, you want to 1) run the container without rebuilding it each time, and 2) have your most recent code be used when you launch the container.

To be blunt, achieving both (1) and (2) cannot be done without using a bind mount (-v host/dir:/docker/dir), ENV variables to switch between code versions as is here, or building separate dev and production images. I.e., you cannot achieve both by using COPY, which would only get (2).

  • Note that this part of the philosophy of containers: they are meant to "freeze" your software exactly how it was when you built the image. The image itself is not meant to be dynamic (which is why containers are so great for reproducing results across environments!); to be dynamic, you must use bind mounts or other methods.

You can nearly achieve both goals if you do not mind doing a (quick) rebuild of your image each time; this is what Anthon's solution will provide. These rebuilds would be fast if you structure your code changes appropriately and make sure not to modify anything that is built earlier in the Dockerfile. This ensures that the preceding steps are not re-run each time you create a new image (since docker build ignores steps that have not changed).

With that in mind, here is a way to use COPY and docker build -f ... to achieve (2) only.

  • Note that again, this will require rebuilding the image each time since COPY will copy a static snapshot of whatever directory you specify; updates to that directory after running docker build ... will not be reflected.

Assuming that you will build the image while in your code directory (not your home directory*), you could add something like this to the end of the Dockerfile:

COPY . /python_app
ENTRYPOINT python /python_app/setup.py develop

and then build the image via:

docker build -t your:tag -f path/to/Dockerfile .

Note that this may be slower than Anthon's method since each rebuild would involve the entire code directory, rather than just your most recent changes (provided that you structure your code into static and development partitions).


*N.b. it is generally not advisable to COPY a large directory (e.g. your full home directory) since it can make the image very large (which may slow down your workflow when running the image on a cluster due to limited bandwidth or I/O!).

Regarding the apt-get update comment in your post: running update in the container ensures that later installs won't be using an old package index. So doing update is good practice since the source upstream image will generally have older package indexes meaning that an install may fail without a prior update. See also In Docker, why is it recommended to run `apt-get` update in the Dockerfile?.

Upvotes: 1

Anthon
Anthon

Reputation: 76772

During development it is IMO perfectly fine to map/mount the hostdirectory with your ever changing sources into the Docker container. The rest (the python version, the other libraries you are dependent upon you can all install in the normal way in the the docker container.

Once stabilized I remove the map/mount and add the package to the list of items to install with pip. I do have a separate container running devpi so I can pip-install packages whether I push them all the way to PyPI or just push them to my local devpi container.

Doing speed up container creation even if you use the common (but more limited) python [path_to_project/setup.py] develop. Your Dockerfile in this case should look like:

 # the following seldom changes, only when a package is added to setup.py
 COPY /some/older/version/of/project/plus/dependent/packages /older/setup
 RUN pip /older/setup/your_package.tar.gz

 # the following changes all the time, but that is only a small amount of work
 COPY /latest/version/of/project     
 RUN python [path_to_project/setup.py] develop

If the first copy would result in changes to files under /older/setup then the container gets rebuilt from there.

Running python ... develop still makes more time and you need to rebuild/restart the container. Since my packages all can also be just copied in/linked to (in addition to be installed) that is still a large overhead. I run a small program in the container that checks if the (mounted/mapped) sources change and then reruns anything I am developing/testing automatically. So I only have to save a new version and watch the output of the container.

Upvotes: 4

YYY
YYY

Reputation: 6341

For deployment/distribution, it would be seamless to have a docker image for your package. If not as an image, you need to transfer you source code to the environment where it needs to be run, configure a volume to have the source inside container so that it can be built etc., With image its just pull and run a container out of it.

But for ease and to get rid of manual steps in building the image, consider using docker-compose.

docker-compose.yml may look like this:

ml_experiment:
  build: <path/to/Dockerfile>
  volumes:
    - ~/data/:/data
  command: ["python", "run_ML_experiment_file.py"] 

Now to build an image and bring up a container you just need to do

docker-compose up --build

The option --build is must to rebuild the image each time, else docker-compose chooses to use the image already built

Refer https://docs.docker.com/compose/

Upvotes: 3

Related Questions