Reputation: 89
I have a directory of images with random ids and a text file which has ids and their corresponding label. I was wondering if there is a way to read the data directly from disk and not loading the entire dataset in ram as a matrix. I know it can be done by using the method of python generators and then follow up with placeholders to feed the data.
def generator_(path1,filename):
.
.
yield x,y
x=tf.placeholder(tf.float32,shape=[None,w,h,3])
y=tf.placeholder(tf.float32,shape=[None,n_c])
x,y=generator_(path_image,'labels.txt')
But what is the better way to do it using tf.data api ?
Upvotes: 2
Views: 1378
Reputation: 15119
Supposing your labels.txt
has such as structure (comma-separated image IDs and labels):
1, 0
2, 2
3, 1
...
42, 2
and your images are stored like:
/data/
|---- image1.jpg
|---- image2.jpg
...
|---- image42.jpg
You could then use tf.data
in such a way:
import tensorflow as tf
def generate_parser(separator=",", image_path=["/data/image", ".jpg"]):
image_path = [tf.constant(image_path[0]), tf.constant(image_path[1])]
def _parse_data(line):
# Split the line according to separator:
line_split = tf.string_split([line], separator)
# Convert label value to int:
label = tf.string_to_number(line_split.values[1], out_type=tf.int32)
# Build complete image path from ID:
image_filepath = image_path[0] + line_split.values[0] + image_path[1]
# Open image:
image_string = tf.read_file(image_filepath)
image_decoded = tf.image.decode_image(image_string)
return image_decoded, label
return _parse_data
label_file = "/var/data/labels.txt"
dataset = (tf.data.TextLineDataset([label_file])
.map(generate_parser(separator=",", image_path=["/data/image", ".jpg"])))
# add .batch(), .repeat(), etc.
Upvotes: 2