Reputation: 4327
I'm trying to merge two docker images.
Here is my Dockerfile
FROM nvidia/cuda:10.0-devel-ubuntu18.04 AS cuda10
FROM osrf/ros:foxy-desktop
COPY --from=cuda10 /usr/local/cuda-10.0 /usr/local/cuda-10.0
RUN cd /usr/local && ln -s cuda-10.0 cuda
COPY --from=cuda10 \
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.410.129 \
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.129 \
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/libcuda.so.410.129 \
/usr/lib/x86_64-linux-gnu/libcuda.so.460.32.03 \
/usr/lib/x86_64-linux-gnu/
Build fails:
$ docker build . -t nvidia-ros:osrf
Step 5/7 : COPY --from=cuda10 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03 /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.410.129 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.129 /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.32.03 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.32.03 /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.32.03 /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.32.03 /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.32.03 /usr/lib/x86_64-linux-gnu/libcuda.so.410.129 /usr/lib/x86_64-linux-gnu/libcuda.so.460.32.03 /usr/lib/x86_64-linux-gnu/
COPY failed: stat usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03: file does not exist
However these files do exist:
$ docker run -it --rm --gpus all nvidia/cuda:10.0-devel-ubuntu18.04
root@fc9c1d8ccdc2:/# ls -la /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.*
lrwxrwxrwx 1 root root 37 Jan 30 14:13 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.460.32.03
-rw-r--r-- 1 root root 12129448 Aug 20 2019 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.129
-rw-r--r-- 1 root root 10516984 Dec 27 18:55 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03
Upvotes: 2
Views: 1090
Reputation: 20306
TL;DR: This file is mounted by the runtime (docs), so it will not be present at the build time. You need to have a couple environment variables in your image or at the container start for the NVIDIA runtime to mount driver libraries inside. Check out the Dockerfile at the end for an example.
To investigate this I ran this command first:
docker run --rm --entrypoint="" -it nvidia/cuda:10.0-devel-ubuntu18.04 \
stat /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03
And got the same error:
stat: cannot stat '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.32.03': No such file or directory
So I went into the directory and looked with ls
:
root@8c34c353bcbb:/usr/lib/x86_64-linux-gnu# ls libnvidia-ptxjitcompiler.so
ls: cannot access 'libnvidia-ptxjitcompiler.so': No such file or directory
root@8c34c353bcbb:/usr/lib/x86_64-linux-gnu# ls libn
libnccl.so libnccl_static.a libnpth.so.0 libnsl.so libnss_files.so libnss_nisplus.so
libnccl.so.2 libnettle.so.6 libnpth.so.0.1.1 libnss_compat.so libnss_hesiod.so
libnccl.so.2.6.4 libnettle.so.6.4 libnsl.a libnss_dns.so libnss_nis.so
There file was missing.
Then I used the command you have shared:
docker run -it --rm --runtime nvidia nvidia/cuda:10.0-devel-ubuntu18.04
root@4a1602f3d5c0:/# ls -la /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.*
lrwxrwxrwx 1 root root 34 Jan 30 14:48 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.450.66
-rw-r--r-- 1 root root 12129448 Aug 20 2019 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.129
-rwxr-xr-x 1 root root 9947144 Sep 28 10:57 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.66
The files were there, but the version was different and it matched my NVIDIA driver version:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
So it appeared to me that this file only exists when you use NVIDIA runtime to start the container. I googled this and found a confirmation here. Documentation states that you need to run a container with several environment variables for driver libs to be mounted. So I've run env
command in an official NVIDIA container and copied every variable with NVIDIA_
prefix into the Dockerfile:
FROM nvidia/cuda:10.0-devel-ubuntu18.04 AS cuda10
FROM osrf/ros:foxy-desktop
COPY --from=cuda10 /usr/local/cuda-10.0 /usr/local/cuda-10.0
RUN cd /usr/local && ln -s cuda-10.0 cuda
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_REQUIRE_CUDA=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411
ENV NVIDIA_VISIBLE_DEVICES=all
Running the new image with NVIDIA runtime I found the files mounted:
docker run --runtime nvidia --rm -it afae756457a9
root@7ebdef701231:/# stat /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.66
File: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.66
Size: 9947144 Blocks: 19432 IO Block: 4096 regular file
Device: 801h/2049d Inode: 131438 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2021-01-30 14:48:05.765015216 +0000
Modify: 2020-09-28 10:57:18.067125173 +0000
Change: 2020-09-28 10:57:18.067125173 +0000
Birth: -
Upvotes: 1