Rahul Bohare
Rahul Bohare

Reputation: 821

How to conda install CUDA enabled PyTorch in a Docker container?

I am trying to build a Docker container on a server within which a conda environment is built. All the other requirements are satisfied except for CUDA enabled PyTorch (I can get PyTorch working without CUDA however, no problem). How do I make sure PyTorch is using CUDA?

This is the Dockerfile :

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml \
    && conda install -y -c conda-forge -n camera-seg flake8

# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]

# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH

This gives me the following error when I try to build this container ( docker build -t camera-seg . ):

.....

Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 ---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed.  (See above for error)

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.



The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1

This is the requirements.yaml:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm

When I put pytorch, torchvision and cudatoolkit=10.2 within the requirements.yaml, then PyTorch is successfully installed but it cannot recognize CUDA ( torch.cuda.is_available() returns False ).

I have tried various solutions, for example, this, this and this and some different combinations of them but all to no avail.

Any help is much appreciated. Thanks.

Upvotes: 15

Views: 26911

Answers (3)

Mhadhbi issam
Mhadhbi issam

Reputation: 420

Check nvidia drivers are installed :

nvidia-smi -a

if not working try :

sudo apt-get install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-drivers

Note : Reboot your os after this installation

Install nvidia docker run-time on you machine :

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution="ubuntu22.04"
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
echo -e  '{\n    "default-runtime": "nvidia",\n    "runtimes": {\n        "nvidia": {\n            "path": "/usr/bin/nvidia-container-runtime",\n            "runtimeArgs": []\n        }\n    }\n}' > /etc/docker/daemon.json
sudo systemctl restart docker

Verify nvidia is setup for the docker :

sudo docker info | grep Runtime

code for docker file :

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
RUN apt-get update
RUN apt-get install -y locales
RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG=en_US.utf8
RUN export LANG=en_US.utf8

##################################################
#           install dependecies                  #
##################################################
RUN apt-get update && apt-get install -y wget git build-essential nano unzip curl 
RUN apt-get update && apt-get install -y g++



##################################################
#           install conda                        #
##################################################
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
RUN bash miniconda.sh -b -p /miniconda
RUN rm miniconda.sh
ENV PATH="/miniconda/bin:${PATH}"
RUN echo 'export PATH="/miniconda/bin:${PATH}"' >> ~/.bashrc
RUN /miniconda/bin/conda config --set auto_activate_base true
RUN /miniconda/bin/conda init

##################################################
#           install PYTORCH                      #
##################################################
RUN /miniconda/bin/conda  install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

##################################################
#           install code-server                  #
##################################################
RUN curl -fsSL https://code-server.dev/install.sh | sh

##################################################
#                  CMD                           #
##################################################
EXPOSE 5001
CMD code-server --bind-addr 0.0.0.0:5001 --auth none /home

Upvotes: 1

nim.py
nim.py

Reputation: 477

I managed to set it up with the following Dockerfile:

FROM nvidia/cuda:11.3.1-devel-ubuntu20.04
ENV TZ=Europe/Brussels

RUN apt-get update --fix-missing && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --no-install-recommends \
   build-essential \
   python3 \
   python3-dev \
   python3-pip

RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

I made sure the cuda version is the same as installed on the machine where the docker container would be running.

Then I did docker build and run as follows:

$ docker build . -t docker-example:latest
$ docker run --gpus all --interactive --tty docker-example:latest

Inside the docker container, inside a python shell, torch.cuda.is_available() would then return True.

Upvotes: 5

Rahul Bohare
Rahul Bohare

Reputation: 821

I got it working after many, many tries. Posting the answer here in case it helps anyone.

Basically, I installed pytorch and torchvision through pip (from within the conda environment) and rest of the dependencies through conda as usual.

This is how the final Dockerfile looks:

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml

RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg

And this is how the requirements.yaml looks like:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm
  - pip:
    - torch
    - torchvision

Then I build the container using the command docker build -t camera-seg . and PyTorch is now being able to recognize CUDA.

Upvotes: 19

Related Questions