Reputation: 321
I'm trying to implement an input pipeline to my model that reads from TFRecords binary files; each binary file contains one example (image, label, other stuff I need)
I have a text file with the file path list; then:
the problem is that it turns out that the same example can be read multiple times and some examples may not be visited at all; I set the number of steps as the total number of images divided by the batch size; so I would expect that at the end of the last step all the input examples have been visited, but this is not the case; instead, some are visited more than once and some never (randomly); this makes my test evaluation totally unrealiable
if anybody has a hint of what I am doing wrong, please let me know
simplified version of my code for model testing is below; Thanks!
def my_input(file_list, batch_size)
filename = []
f = open(file_list, 'r')
for line in f:
filename.append(params.TEST_RECORDS_DATA_DIR + line[:-1])
filename_queue = tf.train.string_input_producer(filename)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features={
'image_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
'name': tf.FixedLenFeature([], tf.string)
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
image = tf.cast(image, tf.float32)/255.0
image = preprocess(image)
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
name = features['name']
images, labels, image_names = tf.train.batch([image, label, name],
batch_size=batch_size, num_threads=2,
capacity=1000 + 3 * batch_size, min_after_dequeue=1000)
return images, labels, image_names
def main()
with tf.Graph().as_default():
# call input operations
images, labels, image_names = my_input(file_list=params.TEST_FILE_LIST, batch_size=params.BATCH_SIZE)
# load a trained model and make predictions
prediction = infer(images, labels, image_names)
with tf.Session() as sess:
for step in range(params.N_STEPS):
prediction_values = sess.run([prediction])
# process output
return
Upvotes: 0
Views: 232
Reputation: 4647
My guess would be that tf.train.string_input_producer(filename)
is set to produce the filename indefinitely and if you batch the examples in multiple (2
) threads, it may be the case that one thread already started processing the file the second time, whereas the other one didn't manage to finish the first round yet. To read each example exactly one, use:
tf.train.string_input_producer(filename, num_epochs=1)
and initialize local variables at the start of the session:
sess.run(tf.initialize_local_variables())
Upvotes: 0