Multi GPU Training in Tensorflow (Data Parallelism) when Using feed_dict

Question

I would like to use multiple GPUs to train my Tensorflow model taking advantage of data parallelism.

I am currently training a Tensorflow model using the following approach:

x_ = tf.placeholder(...)
y_ = tf.placeholder(...)
y = model(x_)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
optimizer = tf.train.AdamOptimizer()
train_op = tf.contrib.training.create_train_op(loss, optimizer)
for i in epochs:
   for b in data:
      _ = sess.run(train_op, feed_dict={x_: b.x, y_: b.y})

I would like to take advantage of multiple GPUs to train this model in a data parallelize manner. i.e. I would like to split my batches in half and run each half batch on one of my two GPUs.

cifar10_multi_gpu_train seems to provide a good example of creating a loss that draws from graphs running on multiple GPUs, but I haven't found a good examples of doing this style of training when using feed_dict and placeholder as opposed to a data loader queue.

UPDATE

Seems like: https://timsainb.github.io/multi-gpu-vae-gan-in-tensorflow.html might provide a good example. They seem to pull in average_gradients from cifar10_multi_gpu_train.py and create one placeholder which they then slice into for each of the GPUs. I think you also need to split create_train_op into three stages: compute_gradients, average_gradients and then apply_gradients.

Multi GPU Training in Tensorflow (Data Parallelism) when Using feed_dict

Answers (1)

Related Questions