Reputation: 1859
I tried to extract the current epoch number while reading data using multiple cpu threads. However during a trial code I observed an output which did not make any sense. Consider the code below :
with tf.Session() as sess:
train_filename_queue = tf.train.string_input_producer(trainimgs, num_epochs=4, shuffle=True)
value = train_filename_queue.dequeue()
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
coord = tf.train.Coordinator()
tf.train.start_queue_runners(coord=coord)
collections = [v.name for v in tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES,\
scope='input_producer/limit_epochs/epochs:0')]
print(collections)
threads = [threading.Thread(target=work, args=(coord, value, sess, collections)) for i in \
range(20)]
for t in threads:
t.start()
coord.join(threads)
coord.request_stop()
The work
function is defined as below :
def work(coord, val, sess, collections):
counter = 0
while not coord.should_stop():
try:
epoch = sess.run(collections[0])
filename = sess.run(val).decode(encoding='UTF-8')
print(filename + ' ' + str(epoch))
except tf.errors.OutOfRangeError:
coord.request_stop()
return None
The output I obtain is the following :
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:84:00.0
Total memory: 11.92GiB
Free memory: 11.80GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): GeForce GTX TITAN X, Compute Capability 5.2
['input_producer/limit_epochs/epochs:0']
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 2
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4
The last number in each line corresponds to the value of input_producer/limit_epochs/epochs:0'
local variable.
For a first trial, I kept only 10 images in the queue meaning I should get a total of 40 lines of output, which I get.
Why am I getting the same number 4 in all the lines ?
Further Information
Upvotes: 0
Views: 654
Reputation: 1859
I did a lot of experiments and finally concluded the following :
I used to believe that -
tf.train.string_input_producer()
enqueues a queue epoch-wise.
Meaning that, first one complete epoch is enqueued (in multiple
stages if capacity is less than the number of filenames) and then
further epochs are enqueued.
It is not really the case.
When
tf.start_queue_runners()
is executed, all the epochs are enqueued together (in multiple stages if capacity is less than number of filenames). The local variableepochs:0
is used bytf.train.string_input_producer
to maintain the epoch that is being enqueued. Onceepochs:0
reachesnum_epochs
, it remains constant and no matter how many threads are dequeuing from the queue, it does not change.
When you capture the value of epochs:0
it gives you the instantaneous value of the counter epochs
and it tells you that at that time which epoch of the dataset is being enqueued. It does not tell you that which epoch of the dataset are you dequeuing.
So it is a bad idea to get the value of the current epoch from the epochs:0
local_variable
.
Upvotes: 1