how to use nvidia-docker to create service in docker swarm

Question

How do use nvidia-docker to create service in swarm mode of docker operation. I am trying to train a tensorflow model in this swarm network to undergo distributed learning. I found that one way could be to run a swarm network of different containers in different machines and use GPU on each machine to undergo distributed training. If its not possible in swarm mode, Is there any possible way to accomplish the above task?

docker service create --name tensorflow --network overnet saikishor/tfm:test azt0tczwkxaqpkh9yaea4laq1

Since --detach=false was not specified, tasks will be created in the background.

In a future release, --detach=false will become the default

but under docker service ls, I have this

ID NAME MODE REPLICAS IMAGE PORTS

uf6jgp3tm6dp tensorflow replicated 0/1 saikishor/tfm:test

saikishor · Accepted Answer

As of now, nvidia-docker is not supporting docker swarm. So, there is no possibility now. We need to create an external network to plug them together.

how to use nvidia-docker to create service in docker swarm

Answers (2)

Install

Debian-based distributions

RHEL-based distributions

Config

Related Questions