Zhenyu Wu
Zhenyu Wu

Reputation: 214

How to train a deep neural network (tensorflow) on EC2 Spark cluster?

I am using deep learning to do image recognition on a large data set with 100 caterogies. (compare to the size of cifar-100) I am now tuning the hyperparameters on a single machine without GPU. The training is extremely slow. I wonder if there is any existing method to do the training on EC2 Spark cluster? I know there is SparkNet, but it seems to only support Caffe.

Upvotes: 3

Views: 659

Answers (2)

mrry
mrry

Reputation: 126184

There have been a couple of recent developments that make it possible to reuse your Spark cluster for training with TensorFlow:

  • Yahoo! published TensorFlowOnSpark, which uses Spark to manage a distributed TensorFlow cluster for you, and helps with issues like data ingestion, startup and shutdown.

  • If you are running Spark on a Mesos cluster, you can follow the instructions here to run TensorFlow on the same cluster.

Upvotes: 1

Himaprasoon
Himaprasoon

Reputation: 2659

As @Ramon commented , spark with tensorflow can be used for hyper parameter tuning by broadcasting the parameters. See this example from databricks

def map_fun(i):
  import tensorflow as tf
  with tf.Graph().as_default() as g:
    hello = tf.constant('Hello, TensorFlow!', name="hello_constant")
    with tf.Session() as sess:
      return sess.run(hello)

rdd = sc.parallelize(range(10))
rdd.map(map_fun).collect()

Output:

['Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!',
 'Hello, TensorFlow!']

Upvotes: 1

Related Questions