Using tf.data to read data from disk

Question

I have a directory of images with random ids and a text file which has ids and their corresponding label. I was wondering if there is a way to read the data directly from disk and not loading the entire dataset in ram as a matrix. I know it can be done by using the method of python generators and then follow up with placeholders to feed the data.

def generator_(path1,filename):
    .
    .
    yield x,y

x=tf.placeholder(tf.float32,shape=[None,w,h,3])
y=tf.placeholder(tf.float32,shape=[None,n_c])

x,y=generator_(path_image,'labels.txt')

But what is the better way to do it using tf.data api ?

benjaminplanche · Accepted Answer

Supposing your labels.txt has such as structure (comma-separated image IDs and labels):

1, 0
2, 2
3, 1
...
42, 2

and your images are stored like:

/data/
   |---- image1.jpg 
   |---- image2.jpg
   ...
   |---- image42.jpg

You could then use tf.data in such a way:

import tensorflow as tf

def generate_parser(separator=",", image_path=["/data/image", ".jpg"]):

    image_path = [tf.constant(image_path[0]), tf.constant(image_path[1])]

    def _parse_data(line):
        # Split the line according to separator:
        line_split = tf.string_split([line], separator)

        # Convert label value to int:
        label = tf.string_to_number(line_split.values[1], out_type=tf.int32)

        # Build complete image path from ID:
        image_filepath = image_path[0] + line_split.values[0] + image_path[1]

        # Open image:
        image_string = tf.read_file(image_filepath)
        image_decoded = tf.image.decode_image(image_string)

        return image_decoded, label

    return _parse_data

label_file = "/var/data/labels.txt"
dataset = (tf.data.TextLineDataset([label_file])
           .map(generate_parser(separator=",", image_path=["/data/image", ".jpg"])))
           # add .batch(), .repeat(), etc.

Using tf.data to read data from disk

Answers (1)

Related Questions