Reputation: 71
I am using distributed Tensorflow on aws using gpus. When I train the model on my local machine, I indicate ps_host/workers_host as something like 'localhost:2225'. What are the ps/workers host I need to use in case of aws?
Upvotes: 4
Views: 196
Reputation: 145
When a distributed TF code is run on the cluster, other nodes could be accessed through "private ip: port number
".
But the problem with AWS is that the other nodes can not be easily launched and it needs extra configuration.
Upvotes: 0
Reputation: 1530
here's a good github project showing how to use Distributed TensorFlow on AWS with Kubernetes or the new AWS SageMaker: https://github.com/pipelineai/pipeline
at minimum, you should be using the TensorFlow Estimator API. there are lots of hidden, not-so-well-documented tricks to Distributed TensorFlow.
some of the better examples live here: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census
Upvotes: 2