tensorflow input pipeline: samples are read more than once

Question

I'm trying to implement an input pipeline to my model that reads from TFRecords binary files; each binary file contains one example (image, label, other stuff I need)

I have a text file with the file path list; then:

I read the text file as a list, which I feed to string_input_producer() to generate a queue;
I feed the queue to a TFRecordReader that reads the serialized example and I decode the binary data
I use shuffle_batch() to arrange the examples into batches
I use the batches to evaluate my model

the problem is that it turns out that the same example can be read multiple times and some examples may not be visited at all; I set the number of steps as the total number of images divided by the batch size; so I would expect that at the end of the last step all the input examples have been visited, but this is not the case; instead, some are visited more than once and some never (randomly); this makes my test evaluation totally unrealiable

if anybody has a hint of what I am doing wrong, please let me know

simplified version of my code for model testing is below; Thanks!

def my_input(file_list, batch_size)

    filename = []
    f = open(file_list, 'r')
    for line in f:
        filename.append(params.TEST_RECORDS_DATA_DIR + line[:-1])

    filename_queue = tf.train.string_input_producer(filename)

    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)

    features = tf.parse_single_example(
        serialized_example,
        features={
            'image_raw': tf.FixedLenFeature([], tf.string),
            'label_raw': tf.FixedLenFeature([], tf.string),
            'name': tf.FixedLenFeature([], tf.string)
            })

    image = tf.decode_raw(features['image_raw'], tf.uint8)
    image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
    image = tf.reshape(image, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
    image = tf.cast(image, tf.float32)/255.0
    image = preprocess(image)

    label = tf.decode_raw(features['label_raw'], tf.uint8)
    label.set_shape(params.NUM_CLASSES)

    name = features['name']

    images, labels, image_names = tf.train.batch([image, label, name],
            batch_size=batch_size, num_threads=2,
            capacity=1000 + 3 * batch_size, min_after_dequeue=1000)

    return images, labels, image_names


def main()

    with tf.Graph().as_default():

        # call input operations
        images, labels, image_names = my_input(file_list=params.TEST_FILE_LIST, batch_size=params.BATCH_SIZE)

        # load a trained model and make predictions     
        prediction = infer(images, labels, image_names)

        with tf.Session() as sess:

            for step in range(params.N_STEPS):
                prediction_values = sess.run([prediction])
                # process output

    return

tensorflow input pipeline: samples are read more than once

Answers (1)

Related Questions