Reputation: 928
How do use nvidia-docker to create service in swarm mode of docker operation. I am trying to train a tensorflow model in this swarm network to undergo distributed learning. I found that one way could be to run a swarm network of different containers in different machines and use GPU on each machine to undergo distributed training. If its not possible in swarm mode, Is there any possible way to accomplish the above task?
docker service create --name tensorflow --network overnet saikishor/tfm:test azt0tczwkxaqpkh9yaea4laq1
Since --detach=false was not specified, tasks will be created in the background.
In a future release, --detach=false will become the default
but under docker service ls, I have this
ID NAME MODE REPLICAS IMAGE PORTS
uf6jgp3tm6dp tensorflow replicated 0/1 saikishor/tfm:test
Upvotes: 3
Views: 1270
Reputation: 4441
It is impossible when the question is asked, but not now.
Since nvidia-docker2 released, a new docker container runtime, usually named as nvidia
, is supported.
This enables docker run --runtime nvidia ...
to access GPU like nvidia-docker run ...
.
Besides, after the dockerd
option --default-runtime nvidia
is configured, tools like docker-compose
, Docker Swarm and Kubernetes can use GPU too.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
...
}
Upvotes: 3
Reputation: 928
As of now, nvidia-docker is not supporting docker swarm. So, there is no possibility now. We need to create an external network to plug them together.
Upvotes: 0