Miguel Monteiro
Miguel Monteiro

Reputation: 379

TensorFlow 1.4 new Dataset API in Distributed Mode

Before the new Dataset API of TensorFlow 1.4 I was using the following code to create a shared queue of filenames between different workers:

# queue with the file names that can be shared amongst workers during training
filename_queue = tf.FIFOQueue(100, tf.string, shared_name=shared_name)
enque_op = filename_queue.enqueue_many([tf.train.limit_epochs(file_names, num_epochs)])
close_op = filename_queue.close(cancel_pending_enqueues=True)

# create queue runner and add it to queue runners
qr = tf.train.QueueRunner(filename_queue, [enque_op], close_op,
                          queue_closed_exception_types=(tf.errors.OutOfRangeError, tf.errors.CancelledError))
tf.train.add_queue_runner(qr)

# read example from file
reader = tf.TFRecordReader()
_, example = reader.read(filename_queue)

# parse example
image, ground_truth, example_name = parse_example(example)

This code uses queues and queue runners and it's quite ugly and confusing. But it allowed the option shared_name= to create a shared queue between workers so they wouldn't work on the same examples.

After the new release of TensorFlow 1.4 input pipelines have become much more easy to use. So I want to update my program to use this new feature. However I can't find anywhere in the new documentation how to have a shared Dataset between workers.

Is this done automatically or is not a feature?

Upvotes: 1

Views: 919

Answers (1)

jsimsa
jsimsa

Reputation: 315

You can use tf.data.Dataset.shard (see documentation) for this purpose. The documentation illustrates how to "shard" elements of a single file or (like in your example) "shard" the filenames.

Upvotes: 1

Related Questions