Reputation: 10983
I would like to use multiple GPUs to train my Tensorflow model taking advantage of data parallelism.
I am currently training a Tensorflow model using the following approach:
x_ = tf.placeholder(...)
y_ = tf.placeholder(...)
y = model(x_)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
optimizer = tf.train.AdamOptimizer()
train_op = tf.contrib.training.create_train_op(loss, optimizer)
for i in epochs:
for b in data:
_ = sess.run(train_op, feed_dict={x_: b.x, y_: b.y})
I would like to take advantage of multiple GPUs to train this model in a data parallelize manner. i.e. I would like to split my batches in half and run each half batch on one of my two GPUs.
cifar10_multi_gpu_train seems to provide a good example of creating a loss that draws from graphs running on multiple GPUs, but I haven't found a good examples of doing this style of training when using feed_dict
and placeholder
as opposed to a data loader queue.
UPDATE
Seems like: https://timsainb.github.io/multi-gpu-vae-gan-in-tensorflow.html might provide a good example. They seem to pull in average_gradients
from cifar10_multi_gpu_train.py
and create one placeholder which they then slice into for each of the GPUs.
I think you also need to split create_train_op
into three stages: compute_gradients
, average_gradients
and then apply_gradients
.
Upvotes: 13
Views: 11355
Reputation: 57
I know three ways of feeding data on multi-gpu model.
x
on CPU, then use tf.split
to split x
into xs
. Then on each tower of GPU, get xs[i]
as your input.with tf.device("/cpu:0"):
encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
# make sure batch % num_gpu == 0
inputs = tf.split(encoder_inputs, axis=0) # axis=0, split on batch dimension
lens = tf.split(encoder_length, axis=0)
with tf.variable_scope(tf.get_variable_scope()):
for i in range(num_gpus):
with tf.device("/gpu:%d"%i):
with tf.name_scope("tower_%d"%i):
loss = compute_loss(inputs[i], lens[i])
x
on every GPU with a scope.
def init_placeholder(self):
with tf.variable_scope("inputs"): # use a scope
encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
return encoder_inputs, encoder_length
with tf.variable_scope(tf.get_variable_scope()):
for g, gpu in enumerate(GPUS):
with tf.device("/gpu:%d"%gpu):
with tf.name_scope("tower_%d"%g):
x, x_len = model.init_placeholder() # these placeholder Tensor are on GPU
loss = model.compute_loss(x, x_len)
tf.data.Dataset
to feed data. google official cifar10_multi_gpu_train.py
use Queue
, which is similar with this way.Upvotes: 1