Reputation: 821
I am trying to build a Docker container on a server within which a conda environment is built. All the other requirements are satisfied except for CUDA enabled PyTorch (I can get PyTorch working without CUDA however, no problem). How do I make sure PyTorch is using CUDA?
This is the Dockerfile
:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml \
&& conda install -y -c conda-forge -n camera-seg flake8
# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]
# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
This gives me the following error when I try to build this container ( docker build -t camera-seg .
):
.....
Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed. (See above for error)
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1
This is the requirements.yaml
:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
When I put pytorch
, torchvision
and cudatoolkit=10.2
within the requirements.yaml
, then PyTorch is successfully installed but it cannot recognize CUDA ( torch.cuda.is_available()
returns False
).
I have tried various solutions, for example, this, this and this and some different combinations of them but all to no avail.
Any help is much appreciated. Thanks.
Upvotes: 15
Views: 26911
Reputation: 420
nvidia-smi -a
if not working try :
sudo apt-get install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-drivers
Note : Reboot your os after this installation
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution="ubuntu22.04"
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
echo -e '{\n "default-runtime": "nvidia",\n "runtimes": {\n "nvidia": {\n "path": "/usr/bin/nvidia-container-runtime",\n "runtimeArgs": []\n }\n }\n}' > /etc/docker/daemon.json
sudo systemctl restart docker
Verify nvidia is setup for the docker :
sudo docker info | grep Runtime
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
RUN apt-get update
RUN apt-get install -y locales
RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG=en_US.utf8
RUN export LANG=en_US.utf8
##################################################
# install dependecies #
##################################################
RUN apt-get update && apt-get install -y wget git build-essential nano unzip curl
RUN apt-get update && apt-get install -y g++
##################################################
# install conda #
##################################################
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
RUN bash miniconda.sh -b -p /miniconda
RUN rm miniconda.sh
ENV PATH="/miniconda/bin:${PATH}"
RUN echo 'export PATH="/miniconda/bin:${PATH}"' >> ~/.bashrc
RUN /miniconda/bin/conda config --set auto_activate_base true
RUN /miniconda/bin/conda init
##################################################
# install PYTORCH #
##################################################
RUN /miniconda/bin/conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
##################################################
# install code-server #
##################################################
RUN curl -fsSL https://code-server.dev/install.sh | sh
##################################################
# CMD #
##################################################
EXPOSE 5001
CMD code-server --bind-addr 0.0.0.0:5001 --auth none /home
Upvotes: 1
Reputation: 477
I managed to set it up with the following Dockerfile:
FROM nvidia/cuda:11.3.1-devel-ubuntu20.04
ENV TZ=Europe/Brussels
RUN apt-get update --fix-missing && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --no-install-recommends \
build-essential \
python3 \
python3-dev \
python3-pip
RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
I made sure the cuda version is the same as installed on the machine where the docker container would be running.
Then I did docker build and run as follows:
$ docker build . -t docker-example:latest
$ docker run --gpus all --interactive --tty docker-example:latest
Inside the docker container, inside a python shell, torch.cuda.is_available()
would then return True
.
Upvotes: 5
Reputation: 821
I got it working after many, many tries. Posting the answer here in case it helps anyone.
Basically, I installed pytorch
and torchvision
through pip
(from within the conda
environment) and rest of the dependencies through conda
as usual.
This is how the final Dockerfile
looks:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml
RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg
And this is how the requirements.yaml
looks like:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- pip
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
- pip:
- torch
- torchvision
Then I build the container using the command docker build -t camera-seg .
and PyTorch is now being able to recognize CUDA.
Upvotes: 19