M_T_JABER
M_T_JABER

Reputation: 71

Distributed Tensorflow: ps/workers hosts on aws ?

I am using distributed Tensorflow on aws using gpus. When I train the model on my local machine, I indicate ps_host/workers_host as something like 'localhost:2225'. What are the ps/workers host I need to use in case of aws?

Upvotes: 4

Views: 196

Answers (2)

Kevin
Kevin

Reputation: 145

When a distributed TF code is run on the cluster, other nodes could be accessed through "private ip: port number".

But the problem with AWS is that the other nodes can not be easily launched and it needs extra configuration.

Upvotes: 0

Chris Fregly
Chris Fregly

Reputation: 1530

here's a good github project showing how to use Distributed TensorFlow on AWS with Kubernetes or the new AWS SageMaker: https://github.com/pipelineai/pipeline

at minimum, you should be using the TensorFlow Estimator API. there are lots of hidden, not-so-well-documented tricks to Distributed TensorFlow.

some of the better examples live here: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census

Upvotes: 2

Related Questions