Reputation: 214
I am using deep learning to do image recognition on a large data set with 100 caterogies. (compare to the size of cifar-100) I am now tuning the hyperparameters on a single machine without GPU. The training is extremely slow. I wonder if there is any existing method to do the training on EC2 Spark cluster? I know there is SparkNet, but it seems to only support Caffe.
Upvotes: 3
Views: 659
Reputation: 126184
There have been a couple of recent developments that make it possible to reuse your Spark cluster for training with TensorFlow:
Yahoo! published TensorFlowOnSpark, which uses Spark to manage a distributed TensorFlow cluster for you, and helps with issues like data ingestion, startup and shutdown.
If you are running Spark on a Mesos cluster, you can follow the instructions here to run TensorFlow on the same cluster.
Upvotes: 1
Reputation: 2659
As @Ramon commented , spark with tensorflow can be used for hyper parameter tuning by broadcasting the parameters. See this example from databricks
def map_fun(i):
import tensorflow as tf
with tf.Graph().as_default() as g:
hello = tf.constant('Hello, TensorFlow!', name="hello_constant")
with tf.Session() as sess:
return sess.run(hello)
rdd = sc.parallelize(range(10))
rdd.map(map_fun).collect()
Output:
['Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!',
'Hello, TensorFlow!']
Upvotes: 1