安江重信
安江重信

Reputation: 21

How do you run Distributed Tensorflow on GKE?

I want to run the Distributed Tensorflow on GKE. You want a sample of up to run of Distributed TensorFlow from the setting of GKE. Do you know a good sample?

Upvotes: 2

Views: 587

Answers (2)

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

If you want to run TensorFlow on Google's Cloud Platform one option is Google Cloud Machine Learning.

Upvotes: 0

mrry
mrry

Reputation: 126154

A recent workshop (slides) at OSCON and PyCon covered (among other things) running distributed TensorFlow on Kubernetes. There is a GitHub repository including the necessary configuration scripts and a Jupyter notebook that can be used to interact with the cluster.

See the workshop for more details, but the basic idea is that the master, each worker, and each parameter server runs in a separate Kubernetes replication controller of size 1. Kubernetes gives stable names to each of these processes, which you can use to build a tf.train.ClusterSpec, and interconnect the different processes.

There are other ways to set up a cluster, which require more configuration, but the tutorial gives a nice introduction to setting up synchronous training on a word2vec model.

Upvotes: 2

Related Questions