Reputation: 495
I am trying to build a docker image for a python script that I would like to deploy. This is the first time I am using docker so I'm probably doing something wrong but I have no clue what.
My System:
OS: Ubuntu 20.04
docker version: 19.03.8
I am using this Dockerfile:
# Dockerfile
FROM nvidia/cuda:11.0-base
COPY . /SingleModelTest
WORKDIR /SingleModelTest
RUN nvidia-smi
RUN set -xe \ #these are just to make sure pip and git are installed to install the requirements
&& apt-get update \
&& apt-get install python3-pip -y \
&& apt-get install git -y
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt #this is where it fails
ENTRYPOINT ["python"]
CMD ["TabNetAPI.py"]
The output from nvidia-smi is as expected:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 On | N/A |
| 0% 54C P0 N/A / 90W | 1983MiB / 1995MiB | 18% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
So cuda does work, but when I try to install the required packages from the requirements files this happens:
command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
cwd: /SingleModelTest/src/mmdet/
Complete output (24 lines):
running develop
running egg_info
creating mmdet.egg-info
writing mmdet.egg-info/PKG-INFO
writing dependency_links to mmdet.egg-info/dependency_links.txt
writing requirements to mmdet.egg-info/requires.txt
writing top-level names to mmdet.egg-info/top_level.txt
writing manifest file 'mmdet.egg-info/SOURCES.txt'
reading manifest file 'mmdet.egg-info/SOURCES.txt'
writing manifest file 'mmdet.egg-info/SOURCES.txt'
running build_ext
building 'mmdet.ops.utils.compiling_info' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/mmdet
creating build/temp.linux-x86_64-3.8/mmdet/ops
creating build/temp.linux-x86_64-3.8/mmdet/ops/utils
creating build/temp.linux-x86_64-3.8/mmdet/ops/utils/src
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/usr/local/lib/python3.8/dist-packages/torch/include -I/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/dist-packages/torch/include/TH -I/usr/local/lib/python3.8/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c mmdet/ops/utils/src/compiling_info.cpp -o build/temp.linux-x86_64-3.8/mmdet/ops/utils/src/compiling_info.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=compiling_info -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
mmdet/ops/utils/src/compiling_info.cpp:3:10: fatal error: cuda_runtime_api.h: No such file or directory
3 | #include <cuda_runtime_api.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SingleModelTest/src/mmdet/setup.py'"'"'; __file__='"'"'/SingleModelTest/src/mmdet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
The package that fails is mmdetection. I am using 2 seperate requirements files to make sure some packages are installed before others to prevent a dependency failure
requirements1.txt:
torch==1.4.0+cu100
-f https://download.pytorch.org/whl/torch_stable.html
torchvision==0.5.0+cu100
-f https://download.pytorch.org/whl/torch_stable.html
numpy==1.19.2
requirements2.txt:
addict==2.3.0
albumentations==0.5.0
appdirs==1.4.4
asynctest==0.13.0
attrs==20.2.0
certifi==2020.6.20
chardet==3.0.4
cityscapesScripts==2.1.7
click==7.1.2
codecov==2.1.10
coloredlogs==14.0
coverage==5.3
cycler==0.10.0
Cython==0.29.21
decorator==4.4.2
flake8==3.8.4
Flask==1.1.2
humanfriendly==8.2
idna==2.10
imagecorruptions==1.1.0
imageio==2.9.0
imgaug==0.4.0
iniconfig==1.1.1
isort==5.6.4
itsdangerous==1.1.0
Jinja2==2.11.2
kiwisolver==1.2.0
kwarray==0.5.9
MarkupSafe==1.1.1
matplotlib==3.3.2
mccabe==0.6.1
mmcv==0.4.3
-e git+https://github.com/open-mmlab/mmdetection.git@0f33c08d8d46eba8165715a0995841a975badfd4#egg=mmdet
networkx==2.5
opencv-python==4.4.0.44
opencv-python-headless==4.4.0.44
ordered-set==4.0.2
packaging==20.4
pandas==1.1.3
Pillow==6.2.2
pluggy==0.13.1
py==1.9.0
pycocotools==2.0.2
pycodestyle==2.6.0
pyflakes==2.2.0
pyparsing==2.4.7
pyquaternion==0.9.9
pytesseract==0.3.6
pytest==6.1.1
pytest-cov==2.10.1
pytest-runner==5.2
python-dateutil==2.8.1
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
requests==2.24.0
scikit-image==0.17.2
scipy==1.5.3
Shapely==1.7.1
six==1.15.0
terminaltables==3.1.0
tifffile==2020.9.3
toml==0.10.1
tqdm==4.50.2
typing==3.7.4.3
ubelt==0.9.2
urllib3==1.25.11
Werkzeug==1.0.1
xdoctest==0.15.0
yapf==0.30.0
The command i use to (try to) build the image:
nvidia-docker build -t firstdockertestsinglemodel:latest
Things I have tried:
I'll be very grateful for any help that anyone could offer. If I need to supply more information I'll be happy to.
Upvotes: 2
Views: 12676
Reputation: 495
Thanks to @Robert Crovella I solved my problem.
Turned out I just needed to use nvidia/cuda/10.0-devel
as base image instead of nvidia/cuda/10.0-base
so my Dockerfile is now:
# Dockerfile
FROM nvidia/cuda:10.0-devel
RUN nvidia-smi
RUN set -xe \
&& apt-get update \
&& apt-get install python3-pip -y \
&& apt-get install git -y
RUN pip3 install --upgrade pip
WORKDIR /SingleModelTest
COPY requirements /SingleModelTest/requirements
RUN export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt
COPY . /SingleModelTest
ENTRYPOINT ["python"]
CMD ["TabNetAPI.py"]
Upvotes: 5
Reputation: 5076
EDIT: this answer just tells you how to verify what's happening in your docker image. Unfortunately I'm unable to figure out why it is happening.
How to check it?
At each step of the docker build, you can see the various layers being generated. You can use that ID to create a temporary image to check what's happening. e.g.
docker build -t my_bonk_example .
[...]
Removing intermediate container xxxxxxxxxxxxx
---> 57778e7c9788
Step 19/31 : RUN mkdir -p /tmp/spark-events
---> Running in afd21d853bcb
Removing intermediate container xxxxxxxxxxxxx
---> 33b26e1a2286 <-- let's use this ID
[ failure happens ]
docker run -it --rm --name bonk_container_before_failure 33b26e1a2286 bash
# now you're in the container
echo $LD_LIBRARY_PATH
ls /usr/local/cuda
side notes about your Dockerfile:
you can improve the build time for future builds if you change the instructions order in your Dockerfile. Docker uses a cache that gets invalidated in the moment it finds something different from the previous build. I'd expect you to change the code more often than the requirements of your docker image, so it'd make sense to move the COPY after the apt instructions. e.g.
# Dockerfile
FROM nvidia/cuda:10.2-base
RUN set -xe \
&& apt-get update \
&& apt-get install python3-pip -y \
&& apt-get install git -y
RUN pip3 install --upgrade pip
WORKDIR /SingleModelTest
COPY requirements /SingleModelTest/requirements
RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt
COPY . /SingleModelTest
RUN nvidia-smi
ENTRYPOINT ["python"]
CMD ["TabNetAPI.py"]
NOTE: this is just an example.
Concerning the Why the image doesn't build, I found that PyTorch 1.4 does not support CUDE 11.0 (https://discuss.pytorch.org/t/pytorch-with-cuda-11-compatibility/89254) but also using a previous version of CUDA does not fix the issue.
Upvotes: 2