Issues at using the Tensorflow Datasets API with Keras

I'm trying to fit a CNN Keras model, feeding it with data handled by the Datasets API from Tensorflow. However, I stumble again and again upon the same Exception, despite following the official documentation (see there):

ValueError: No data provided for "conv2d_8_input". Need data for each key in: ['conv2d_8_input']
# conv2d_8 is the first Conv2D layer of my model, see below

I'm using the MNIST dataset from tensorflow-datasets, images are normalized and class labels are converted into one-hot encodings. You can see an excerpt from the code below.

test_data, train_data = tfds.load("mnist", split=Split.ALL.subsplit([1, 3]))

# [...] Images are normalized using Dataset.map method
# [...] Labels are converted into one-hot encodings as well, using tf.one_hot function

model = keras.Sequential([
    keras.layers.Conv2D(
        32,
        kernel_size=5,
        padding="same",
        input_shape=(28, 28, 1),
        activation="relu",
    ),
    keras.layers.MaxPooling2D(
        (2, 2),
        padding="same"
    ),
    keras.layers.Conv2D(
        64,
        kernel_size=5,
        padding="same",
        activation="relu"
    ),
    keras.layers.MaxPooling2D(
        (2, 2),
        padding="same"
    ),
    keras.layers.Flatten(),
    keras.layers.Dense(
        512,
        activation="relu"
    ),
    keras.layers.Dropout(rate=0.4),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(
    optimizer=tf.train.AdamOptimizer(0.01),
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

train_data = train_data.batch(32).repeat()
test_data = test_data.batch(32).repeat()

model.fit(
    train_data,
    epochs=10,
    steps_per_epoch=30,
    validation_data=test_data,
    validation_steps=3
) # The exception occurs at this step

I don't understand why it doesn't work, I tried to feed the fit method with one shot iterators instead of the datasets, but I get the same result. I'm not used to Keras and TensorFlow (I usually work with PyTorch), so I think I may be missing something obvious.

Upvotes: 2

Answers (3)

Oli

Reputation: 43

For those coming to this page after following TF 2.0 Beta tutorial on Loading images (https://www.tensorflow.org/beta/tutorials/load_data/images):

I was able to avoid the error by returning a tuple in the preprocess_image function

def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [192, 192])
image /= 255.0  # normalize to [0,1] range
return (image,image)

I am not using the labels in my Use Case so you might have to do other changes to follow the tutorial

Upvotes: 3

Philippe Chavanne

Reputation: 344

You can load data from tensorflow-datasets directly as a tuple using as_supervised

test_data, train_data = tfds.load("mnist", split=tfds.Split.ALL.subsplit([1, 3]), as_supervised=True)

Upvotes: 1

Renn Kane

Reputation: 493

Ok, I got it. I enabled eager execution to see if Keras would yield a more precise exception, and I got this:

ValueError: Output of generator should be a tuple `(x, y, sample_weight)` or `(x, y)`. Found: {'image': <tf.Tensor: id=1012, shape=(32, 28, 28, 1), dtype=float64, numpy=array([...])>, 'label': <tf.Tensor: id=1013, shape=(32, 10), dtype=uint8, numpy=array([...]), dtype=uint8)>}

Indeed, the components of my datasets (images and their associated labels) have names ("image" and "label"), because this is how tensorflow_datasets loads them. As a result, an iterator on the datasets yields a dictionary with two values: "image" and "label".

However, Keras expects a tuple of two values (inputs, targets) (or three values (inputs, targets, sample_wheights)), and it doesn't like the dictionary yielded by the Dataset iterator (hence the error I got).

I added the following code before model.fit:

train_data = train_data.map(lambda x: tuple(x.values()))
test_data = test_data.map(lambda x: tuple(x.values()))

And it works.

Upvotes: 1

Issues at using the Tensorflow Datasets API with Keras

Answers (3)

Related Questions