brown.2179
brown.2179

Reputation: 1880

TensorFlow String Input Producer Size()

Trying to do a sanity check on the size of the queue created by tf.train.string_input_producer (r1.3) but just get zero size when expected to get size of three

import tensorflow as tf

file_queue = tf.train.string_input_producer(['file1.csv','file2.csv','file3.csv'])

print('type(file_queue): {}'.format(type(file_queue)))

with tf.Session() as sess:

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    file_queue_size = sess.run(file_queue.size())

    coord.request_stop()
    coord.join(threads)

print('result of queue size operation: {}'.format(file_queue_size))

My hunch was that there was some sort of lazy initialization going on so I figured I would ask the queue for an item and see what the size was after

import tensorflow as tf

file_queue = tf.train.string_input_producer(['file1.csv','file2.csv','file3.csv'])

print('type(file_queue): {}'.format(type(file_queue)))

with tf.Session() as sess:

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    item = sess.run(file_queue.dequeue())
    file_queue_size = sess.run(file_queue.size())

    coord.request_stop()
    coord.join(threads)

print('result of queue size operation: {}'.format(file_queue_size))

While the size is no longer zero, the size provided is not two and changes every time the code is run.

I feel like getting the size is a simple thing but maybe this is just not the way to interact with data_flow_ops.FIFOQueue. Any insight to explain what is going on here would be much appreciated.

Upvotes: 0

Views: 144

Answers (1)

Saurabh Saxena
Saurabh Saxena

Reputation: 479

This is an artifact of the async nature in which queues are filled up by the queue runners. Try this code:

import tensorflow as tf
import time

file_queue = tf.train.string_input_producer(['file1.csv','file2.csv','file3.csv'])

print('type(file_queue): {}'.format(type(file_queue)))

with tf.Session() as sess:

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    time.sleep(1)
    file_queue_size = sess.run(file_queue.size())

    coord.request_stop()
    coord.join(threads)

print('result of queue size operation: {}'.format(file_queue_size))

The output should say result of queue size operation: 32. By pausing the main thread, the queue runners could run for long enough to fill up the queue. Why 32 and not 3? Let's look at the signature for string_input_producer:

string_input_producer(
    string_tensor,
    num_epochs=None,
    shuffle=True,
    seed=None,
    capacity=32,
    shared_name=None,
    name=None,
    cancel_op=None
)

Two main things to note here:

  1. num_epochs=None means keep iterating on the list of items forever. To iterate over the list just once set num_epochs=1. You will also need to call sess.run(tf.local_variables_initializer()). With num_epochs=1 if you run the size op after dequeues you will see the size reducing.
  2. capacity=32 this is the default capacity of the queue. Which is why we are seeing 32 instead of 3 in the output above.

Upvotes: 1

Related Questions