saikishor
saikishor

Reputation: 928

how to use nvidia-docker to create service in docker swarm

How do use nvidia-docker to create service in swarm mode of docker operation. I am trying to train a tensorflow model in this swarm network to undergo distributed learning. I found that one way could be to run a swarm network of different containers in different machines and use GPU on each machine to undergo distributed training. If its not possible in swarm mode, Is there any possible way to accomplish the above task?

docker service create --name tensorflow --network overnet saikishor/tfm:test azt0tczwkxaqpkh9yaea4laq1

Since --detach=false was not specified, tasks will be created in the background.

In a future release, --detach=false will become the default

but under docker service ls, I have this

ID NAME MODE REPLICAS IMAGE PORTS

uf6jgp3tm6dp tensorflow replicated 0/1 saikishor/tfm:test

Upvotes: 3

Views: 1270

Answers (2)

Yan QiDong
Yan QiDong

Reputation: 4441

It is impossible when the question is asked, but not now.

Since nvidia-docker2 released, a new docker container runtime, usually named as nvidia, is supported. This enables docker run --runtime nvidia ... to access GPU like nvidia-docker run .... Besides, after the dockerd option --default-runtime nvidia is configured, tools like docker-compose, Docker Swarm and Kubernetes can use GPU too.

nvidia-gpu-docker

Install

Debian-based distributions

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

RHEL-based distributions

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

Config

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia",
    ...
}

Upvotes: 3

saikishor
saikishor

Reputation: 928

As of now, nvidia-docker is not supporting docker swarm. So, there is no possibility now. We need to create an external network to plug them together.

Upvotes: 0

Related Questions