Reputation: 479
images, labels = tf.train.batch([image, label], batch_size=32, num_threads=4)
I often see a queue created with num_threads
and the threads are said to be for enqueue operation. I don't quite understand the purpose of setting multiple threads for enqueue, because the way I see it, enqueue is just putting a value at the end of the queue.
Upvotes: 1
Views: 198
Reputation: 53758
From Threading and Queues tutorial:
For example, a typical input architecture is to use a
RandomShuffleQueue
to prepare inputs for training a model:
- Multiple threads prepare training examples and push them in the queue.
- A training thread executes a training op that dequeues mini-batches from the queue.
The TensorFlow Session object is multithreaded, so multiple threads can easily use the same session and run ops in parallel.
The idea is that data pipeline is usually I/O intensive: the data may be fetched from the disk or even streamed from the network. It is quite possible for a GPU not to be a bottleneck in computation, simply because the data isn't fed fast enough to saturate it.
Reading in multiple threads solves this problem: while one thread is waiting for I/O task, the other thread already has some data for the GPU. When this data is processed, the first thread hopefully received and prepared its batch, and so on. That's why tf.train.batch
, tf.train.shuffle_batch
and other functions, support multi-thread data processing. Setting num_threads = 1
makes the batching deterministic, but if there are multiple threads, the order of data in the queue is not guaranteed.
Upvotes: 1