Tim
Tim

Reputation: 123

Nvidia Docker in WSL2: Error Response From Daemon: OCI Runtime Create Failed

I want to use nvidia-docker in WSL2 - Ubuntu 18.04 LTS. When I type:

nvidia-smi

It returns:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.41.01    Driver Version: 496.49       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P8    12W /  N/A |    370MiB /  6144MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Everything seems well, HOWEVER, when I type:

nvidia-docker run -it --name="medical" -v /home/tim:/root -p 8892:8892 -p 6010:6010 timliu98/us_backup:compatible /bin/bash

It returns an error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/6168a51b56d40ba448bd429ed67550daf7c6da438b04e3dbb3499401af4f3007/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0001] error waiting for container: context canceled

Kernel:

Linux LAPTOP-ITV4QSFR 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Docker version:

Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:27 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:36 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Nvidia packages version:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                    Version                  Architecture             Description
+++-=======================================-========================-========================-===================================================================================
ii  libnvidia-container-tools               1.8.0-1                  amd64                    NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64              1.8.0-1                  amd64                    NVIDIA container runtime library
un  nvidia-container-runtime                <none>                   <none>                   (no description available)
un  nvidia-container-runtime-hook           <none>                   <none>                   (no description available)
ii  nvidia-container-toolkit                1.8.0-1                  amd64                    NVIDIA container runtime hook
un  nvidia-docker                           <none>                   <none>                   (no description available)
ii  nvidia-docker2                          2.9.0-1                  all                      nvidia-docker CLI wrapper

Nvidia container library version:

cli-version: 1.8.0
lib-version: 1.8.0
build date: 2022-02-04T09:17+00:00
build revision: 05959222fe4ce312c121f30c9334157ecaaee260
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

May I know the potential solution? I had tried so many tutorials unluckily all failed. Thanks all in advance and should you need more information, please let me know.

Upvotes: 1

Views: 2655

Answers (1)

Alexander Higgins
Alexander Higgins

Reputation: 6925

After running nvidia-smi -a the error that the GPU access blocked by the operating system" in WSL2

As explained, the fix is to make sure that [Windows 21H2](What's new in Windows 10, version 21H2) is installed via Windows Update in order to enable GPU support for WSL.

After installing and rebooting I was able to continue PyTorch development after verifying GPU Access as suggested by Docker:

PS C:\Users\alexh> wsl
alexh@DESKTOP-U21F0MC:/mnt/c/Users/alexh$ docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

Output:

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3080]
69632 bodies, total time for 10 iterations: 59.047 ms
= 821.146 billion interactions per second
= 16422.926 single-precision GFLOP/s at 20 flops per interaction

Upvotes: 1

Related Questions