Accessing epoch value across multiple threads using input_producer/limit_epochs/epochs:0 local variable

Question

I tried to extract the current epoch number while reading data using multiple cpu threads. However during a trial code I observed an output which did not make any sense. Consider the code below :

with tf.Session() as sess:
        train_filename_queue = tf.train.string_input_producer(trainimgs, num_epochs=4, shuffle=True)
        value = train_filename_queue.dequeue()
        init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
        sess.run(init_op)
        coord = tf.train.Coordinator()
        tf.train.start_queue_runners(coord=coord)
        collections = [v.name for v in tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES,\
                                                         scope='input_producer/limit_epochs/epochs:0')]
        print(collections)

        threads = [threading.Thread(target=work, args=(coord, value, sess, collections)) for i in \
                   range(20)]
        for t in threads:
            t.start()
        coord.join(threads)
        coord.request_stop()

The work function is defined as below :

def work(coord, val, sess, collections):
    counter = 0
    while not coord.should_stop():
        try:
            epoch = sess.run(collections[0])
            filename = sess.run(val).decode(encoding='UTF-8')
            print(filename + ' ' + str(epoch))
        except tf.errors.OutOfRangeError:
            coord.request_stop()
    return None

The output I obtain is the following :

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:84:00.0
Total memory: 11.92GiB
Free memory: 11.80GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): , 
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): GeForce GTX TITAN X, Compute Capability 5.2
['input_producer/limit_epochs/epochs:0']
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 2
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4

The last number in each line corresponds to the value of input_producer/limit_epochs/epochs:0' local variable.

For a first trial, I kept only 10 images in the queue meaning I should get a total of 40 lines of output, which I get.

However, I should get equal number of 1,2, 3 and 4 as the last character in each line, since each filename should be extracted in each of the 4 epochs.

Why am I getting the same number 4 in all the lines ?

Further Information

I tried using range(1) (for a single thread), and still the same observation.
Don't bother with the digit '0'. It is simply the label of the corresponding file. I saved the image file names in such a way.

Ujjwal · Accepted Answer

I did a lot of experiments and finally concluded the following :

I used to believe that -

tf.train.string_input_producer() enqueues a queue epoch-wise.
Meaning that, first one complete epoch is enqueued (in multiple
stages if capacity is less than the number of filenames) and then
further epochs are enqueued.

It is not really the case.

When tf.start_queue_runners() is executed, all the epochs are enqueued together (in multiple stages if capacity is less than number of filenames). The local variable epochs:0 is used by tf.train.string_input_producer to maintain the epoch that is being enqueued. Once epochs:0 reaches num_epochs, it remains constant and no matter how many threads are dequeuing from the queue, it does not change.

When you capture the value of epochs:0 it gives you the instantaneous value of the counter epochs and it tells you that at that time which epoch of the dataset is being enqueued. It does not tell you that which epoch of the dataset are you dequeuing.

So it is a bad idea to get the value of the current epoch from the epochs:0 local_variable.

Accessing epoch value across multiple threads using input_producer/limit_epochs/epochs:0 local variable

Answers (1)

Related Questions