Reputation: 2164
I am following the tensorflow documentation as mentioned in creating an input pipeline. The following are the code snippets:
import tensorflow as tf
import pathlib
# download data
flowers_root = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
flowers_root = pathlib.Path(flowers_root)
list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))
def parse_image(filename):
parts = tf.strings.split(filename, os.sep)
label = parts[-2]
image = tf.io.read_file(filename)
image = tf.image.decode_jpeg(image)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [128, 128])
return image, label
list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))
# create model
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(filters=32, kernel_size=(3, 3),
input_shape=(128, 128, 3),
activation='relu'))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.BatchNormalization())
model.add(tensorflow.keras.layers.Conv2D(32, (3, 3), activation='relu'))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(32))
model.add(tensorflow.keras.layers.Activation('relu'))
model.add(tensorflow.keras.layers.Dense(3, activation='softmax'))
optimizer = tensorflow.keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
labeled_ds = list_ds.map(parse_image)
labeled_ds = labeled_ds.shuffle(buffer_size=2)
labeled_ds = labeled_ds.batch(2)
labeled_ds = labeled_ds.repeat(3)
# running this line gives an error
model.fit(labeled_ds)
I get the following error while running this:
UnimplementedError: Cast string to float is not supported
[[node loss/dense_3_loss/Cast (defined at <ipython-input-19-52391dd0864d>:1) ]] [Op:__inference_distributed_function_1543]
Function call stack:
distributed_function
I understand this is due to the class names being in strings (sunflowers, tulips and roses) and model.fit()
only accepts numerical inputs. However how do I convert labels to I'm guessing one hot encoding representations (integers) using Tensorflow 2.0
? I have searched high and low in the documentation, but to no avail.
Would really appreciate the help.
Upvotes: 0
Views: 314
Reputation: 1134
I feel that the way the tutorial is creating path-label pairs is very inconvenient if the label isn't already encoded as an integer. For this reason, I advise that you split the data into paths and labels that correspond to the paths. Basically, you will list the directories of each flower type and assign the same label for each path. Then, you can create a data set from tf.data.Dataset.from_tensor_slices()
with a tuple that contains a list of paths and a list of labels. Then, you will need to modify your parse_image()
to take in a path and label. Below is code that should do the trick.
flowers_root = pathlib.Path(flowers_root)
# Modified parser
def parse_image(filename, label):
image = tf.io.read_file(filename)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [128, 128])
return image, label
# Extract all flower types
label_map = tf.io.gfile.listdir(str(flowers_root))[:-1]
# Extract pairs of paths and labels
# ex. ('../../example.jpg', 0)
path_label_pairs = [(str(flowers_root)+'/'+path, label_map.index(label)) for label in label_map for path in tf.io.gfile.listdir(str(flowers_root) + f'/{label}/')]
# Separate paths and labels into their own lists
paths = [pair[0] for pair in path_label_pairs]
labels = [pair[1] for pair in path_label_pairs]
# New data set using from_tensor_slices
dataset = tf.data.Dataset.from_tensor_slices((paths, labels)).map(parse_image)
Upvotes: 1