How to share a queue containing variable length sequences batches between multiple gpus?

Question

According to Tensorflow: Multi-GPU single input queue, it might be better to have a single queue shared by multiple GPUs. And the link suggested that we could increase the batch size and then split batches by ourselves. However, when the input data are variable length sequences, increasing the batch size could result in many zero-padded values.

For example, If we create a 4-sequence batch and split the batch, it could be

/gpu:0

x, x, x, 0, 0, 0, 0, 0, 0
x, x, x, x, x, x, x, x, x

/gpu:1

x, x, 0, 0, 0, 0, 0, 0, 0
x, x, x, x, x, 0, 0, 0, 0

My question is: How to produce batches like:

/gpu:0

x, x, x, 0, 0, 0, 0, 0, 0
x, x, x, x, x, x, x, x, x

/gpu:1

x, x, 0, 0, 0
x, x, x, x, x

Following slim, I tried using a tf.train.batch(data, batch_size=2, dynamic_pad=True) to create batches, putting batches into a tf.PaddingFIFOQueue, and then calling tf.PaddingFIFOQueue.dequeue() on different GPUs. However, it seems that all GPUs got the same data on the latest tensorflow (master).

The following code demonstrate the issue:

import tensorflow as tf

capacity = 10
queue = tf.FIFOQueue(capacity, tf.int64)
enqueue = queue.enqueue_many((list(range(capacity)),))

def clone_fn():
    clone_data = queue.dequeue()
    return clone_data

num_gpus = 2
all_clones_data = []
for gpu_index in range(num_gpus):
    with tf.device('/gpu:{}'.format(gpu_index)):
        all_clones_data.append(clone_fn())

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
    sess.run(enqueue)
    print(sess.run(all_clones_data))

On the latest tensorflow, the output is [0, 0]

On the older tensorflow (0.11), the output is [1, 0] , which is what I want.

It seems slim also fetches the same data across all GPUs with the latest tensorflow.

Is there any better way to share a queue containing variable length sequences between multiple GPUs?

Yaroslav Bulatov · Accepted Answer

Try running with

config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))

This is a bit counter-intuitive:, filed 7038

How to share a queue containing variable length sequences batches between multiple gpus?

Answers (1)

Related Questions