Alex Rothberg
Alex Rothberg

Reputation: 10983

Multi GPU Training in Tensorflow (Data Parallelism) when Using feed_dict

I would like to use multiple GPUs to train my Tensorflow model taking advantage of data parallelism.

I am currently training a Tensorflow model using the following approach:

x_ = tf.placeholder(...)
y_ = tf.placeholder(...)
y = model(x_)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
optimizer = tf.train.AdamOptimizer()
train_op = tf.contrib.training.create_train_op(loss, optimizer)
for i in epochs:
   for b in data:
      _ = sess.run(train_op, feed_dict={x_: b.x, y_: b.y})

I would like to take advantage of multiple GPUs to train this model in a data parallelize manner. i.e. I would like to split my batches in half and run each half batch on one of my two GPUs.

cifar10_multi_gpu_train seems to provide a good example of creating a loss that draws from graphs running on multiple GPUs, but I haven't found a good examples of doing this style of training when using feed_dict and placeholder as opposed to a data loader queue.

UPDATE

Seems like: https://timsainb.github.io/multi-gpu-vae-gan-in-tensorflow.html might provide a good example. They seem to pull in average_gradients from cifar10_multi_gpu_train.py and create one placeholder which they then slice into for each of the GPUs. I think you also need to split create_train_op into three stages: compute_gradients, average_gradients and then apply_gradients.

Upvotes: 13

Views: 11355

Answers (1)

huosan0123
huosan0123

Reputation: 57

I know three ways of feeding data on multi-gpu model.

  1. if all your inputs are of same shape, you may build placeholder x on CPU, then use tf.split to split x into xs. Then on each tower of GPU, get xs[i] as your input.
with tf.device("/cpu:0"):
    encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
    encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")

    # make sure batch % num_gpu == 0
    inputs = tf.split(encoder_inputs, axis=0)  # axis=0, split on batch dimension
    lens = tf.split(encoder_length, axis=0)

with tf.variable_scope(tf.get_variable_scope()):
    for i in range(num_gpus):
        with tf.device("/gpu:%d"%i):
            with tf.name_scope("tower_%d"%i):
                loss = compute_loss(inputs[i], lens[i])

  1. if your inputs have different shape, you need to build placeholder x on every GPU with a scope.

def init_placeholder(self):
    with tf.variable_scope("inputs"):   # use a scope
        encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
        encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
    return encoder_inputs, encoder_length

with tf.variable_scope(tf.get_variable_scope()):
    for g, gpu in enumerate(GPUS):
        with tf.device("/gpu:%d"%gpu):
            with tf.name_scope("tower_%d"%g):
                x, x_len = model.init_placeholder()  # these placeholder Tensor are on GPU
                loss = model.compute_loss(x, x_len)
  1. use tf.data.Dataset to feed data. google official cifar10_multi_gpu_train.py use Queue, which is similar with this way.

Upvotes: 1

Related Questions