Extract integer labels from tf.data pipelines when consuming sets of files

Question

I am following the tensorflow documentation as mentioned in creating an input pipeline. The following are the code snippets:

import tensorflow as tf
import pathlib

# download data
flowers_root = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)
flowers_root = pathlib.Path(flowers_root)
list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))

def parse_image(filename):
    parts = tf.strings.split(filename, os.sep)
    label = parts[-2]
    image = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(image)
    image = tf.image.convert_image_dtype(image, tf.float32)
    image = tf.image.resize(image, [128, 128])
    return image, label

list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))
# create model
model = tensorflow.keras.models.Sequential()

model.add(tensorflow.keras.layers.Conv2D(filters=32, kernel_size=(3, 3),
                                         input_shape=(128, 128, 3),
                                         activation='relu'))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.BatchNormalization())

model.add(tensorflow.keras.layers.Conv2D(32, (3, 3), activation='relu'))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))

model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(32))
model.add(tensorflow.keras.layers.Activation('relu'))
model.add(tensorflow.keras.layers.Dense(3, activation='softmax'))
optimizer = tensorflow.keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

labeled_ds = list_ds.map(parse_image)
labeled_ds = labeled_ds.shuffle(buffer_size=2)
labeled_ds = labeled_ds.batch(2)
labeled_ds = labeled_ds.repeat(3)
# running this line gives an error
model.fit(labeled_ds)

I get the following error while running this:

UnimplementedError:  Cast string to float is not supported
     [[node loss/dense_3_loss/Cast (defined at :1) ]] [Op:__inference_distributed_function_1543]

Function call stack:
distributed_function

I understand this is due to the class names being in strings (sunflowers, tulips and roses) and model.fit() only accepts numerical inputs. However how do I convert labels to I'm guessing one hot encoding representations (integers) using Tensorflow 2.0? I have searched high and low in the documentation, but to no avail.

Would really appreciate the help.

Extract integer labels from tf.data pipelines when consuming sets of files

Answers (1)

Related Questions