CountVonCount
CountVonCount

Reputation: 85

tf.train.batch output is not deterministic

I'm using Tensorflow for learning MNIST data. For batching I create a batch from single images like this:

BatchedInputs = list(tf.train.batch(
  Inputs,
  batch_size=BatchSize,
  num_threads=self._PreprocessThreads,
  capacity=self._MinimumSamplesInQueue + 3 * BatchSize))

When I create (for testing) batches of size 1 and look at those images in TensorBoard, I can see that not in every run, every image is the same like in other runs. They are not directly shuffled, but sometimes another image is contained.

I would expect to get a deterministic output from that operation, but this is not the case. Maybe I'm doing anything wrong (starting queues wrong or something like that)?

Upvotes: 2

Views: 589

Answers (1)

mrry
mrry

Reputation: 126154

If you set num_threads > 1 when calling tf.train.batch(), the resulting program will be non-deterministic, because this will create three uncoordinated prefetching threads that evaluate Input and insert the next element into the queue. Since the prefetching threads are uncoordinated, there is a race between these threads to enqueue elements in the queue, and this leads to the non-determinism in the order of queue elements.

Setting num_threads = 1 should make this part of your program deterministic, assuming that the other parts of your program are deterministic. However, this is a weak guarantee, and—in particular—any use of shuffling in the queue-based input routines will make the program non-deterministic.

Upvotes: 6

Related Questions