user10837800
user10837800

Reputation: 51

Pytorch w/ GPU on Docker Container Error - no CUDA-capable device is detected

I am trying to use Pytorch with a GPU on my Docker Container.

1. On the Host - I have nvidia-docker installed, CUDA Driver etc

Here is the nvidia-smi output from host:

    Fri Mar 20 04:29:49 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    28W / 149W |     16MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1860      G   /usr/lib/xorg/Xorg                            15MiB |
+-----------------------------------------------------------------------------+

2. On the Docker Container (Dockerfile for app - Docker Compose File below) -

FROM ubuntu:latest
FROM dsksd/pytorch:0.4
#FROM nvidia/cuda:10.1-base-ubuntu18.04 
#FROM nablascom/cuda-pytorch
#FROM nvidia/cuda:10.0-base

RUN apt-get update -y --fix-missing
RUN apt-get install -y python3-pip python3-dev build-essential
RUN apt-get install -y sudo curl
#RUN sudo apt-get install -y nvidia-container-toolkit
#RUN apt-get install -y curl python3.7 python3-pip python3.7-dev python3.7-distutils build-essential
#RUN apt-get install -y curl
#RUN apt-get install -y sudo
#RUN curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
#RUN sudo dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
#RUN sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
#RUN sudo apt-get install cuda -y
#----------
# Add the package repositories
#RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
#RUN curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
#RUN curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
#RUN sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
#RUN sudo systemctl restart docker
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda-10.1/compat/
ENV PYTHONPATH $PATH
#----------
ENV LC_ALL=mylocale.utf8
COPY . /app
WORKDIR /app
RUN pip3 install -r requirements.txt
ENTRYPOINT ["python3"]
EXPOSE 5000
CMD ["hook.py"]

When I try running my code on the GPU I run into:

>>> torch.cuda.current_device()
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 386, in current_device
    _lazy_init()
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 193, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50

I invoke the container using : docker-compose up --build

Here is my docker-compose.yaml file:

version: '3.6'
services:
  rdb:
    image: mysql:5.7
    #restart: always
    environment:
      MYSQL_DATABASE: 'c_rdb'
      MYSQL_USER: 'user'
      MYSQL_PASSWORD: 'password'
      MYSQL_ROOT_PASSWORD: '123123'
    #ports:
    #  - '3306:3306'
    #expose:
    #  - '3306'
    volumes:
      - rdb-data:/var/lib/mysql
      - ./init-db/init.sql:/docker-entrypoint-initdb.d/init.sql
  mongo:
    image: mongo
    #restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: 12312323
      MONGO_INITDB_DATABASE: chronicler_ndb
    volumes:
      - ndb-data:/data/db
      - ./init-db/init.js:/docker-entrypoint-initdb.d/init.js
    ports:
      - '27017-27019:27017-27019'
  mongo-express:
    image: mongo-express
    #restart: always
    depends_on:
        - mongo
        - backend
    ports:
      - 8081:8081
    environment:
      ME_CONFIG_MONGODB_ADMINUSERNAME: rooer
      ME_CONFIG_MONGODB_ADMINPASSWORD: 123123
  redis:
    image: redis:latest
    command: ["redis-server", "--appendonly", "yes"]
    hostname: redis
    #ports:
    #  - "6379:6379"
    volumes:
      - cache-data:/data
  backend:
    build: ./app
    ports:
     - "5000:5000"
    volumes:
     - backend-data:/code
    links: 
     - rdb
     - redis

volumes:
  rdb-data:
    name: c-relational-data
  ndb-data:
    name: c-nosql-data
  cache-data:
    name: redis-data
  backend-data:
    name: backend-engine

Upvotes: 4

Views: 10207

Answers (3)

Jian Yin
Jian Yin

Reputation: 1

maybe you can try to use docker run --gpu all xxx when you run the docker image

Upvotes: 0

Crawl.W
Crawl.W

Reputation: 421

I got the cudaErrorNoDevice defined by cudaError_t, means

This indicates that no CUDA-capable devices were detected by the installed CUDA driver.

when I closed my laptop screen(not shutdown, but put down it's lip),and docker, not exit, was working fine before. This situation, restarting docker can fix it.

Upvotes: 0

Smankusors
Smankusors

Reputation: 387

It needs runtime options, but well, the runtime option is not available at compose file format 3. So there's some options

  1. Downgrade your compose file version to 2, so something like this :
version: 2
  backend:
    build: ./app
    ports:
     - "5000:5000"
    volumes:
     - backend-data:/code
    links: 
     - rdb
     - redis
    runtime: nvidia
  1. Or, manually run the container using docker run with --runtime=nvidia argument

Also I recommend using image built by nvidia instead of ubuntu:latest


For more information, you can read the issue here

Upvotes: 2

Related Questions