Burger
Burger

Reputation: 413

Vanishing gradient and very low accuracy in InceptionV3

I'm working on a multi-class classification using InceptionV3 using tensorflow. I thought I got it all right, but the result is very weird and nowhere close to what I wanted.

Epoch 1/10
58/58 [==============================] - 47s 591ms/step - loss: 0.0000e+00 - accuracy: 0.0279
Epoch 2/10
58/58 [==============================] - 38s 591ms/step - loss: 0.0000e+00 - accuracy: 0.0286
Epoch 3/10
58/58 [==============================] - 38s 596ms/step - loss: 0.0000e+00 - accuracy: 0.0249
Epoch 4/10
58/58 [==============================] - 38s 603ms/step - loss: 0.0000e+00 - accuracy: 0.0250

Here is my code.

This chunk of code is where I deal with the data. raw_train comes from oxford_iiit_pet.

dataset, metadata = tfds.load('oxford_iiit_pet', with_info=True, as_supervised=True)
raw_train, raw_test = dataset['train'], dataset['test']
IMAGE_SIZE = (224, 224)


def preprocess_dataset(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5)-1 # might need to fix this part
  image = tf.image.resize(image, IMAGE_SIZE)
  return image, label

BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 1024

train = raw_train.map(preprocess_dataset)
test = raw_test.map(preprocess_dataset)

train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)

This is my model.

IMG_SHAPE = (IMAGE_SIZE[0], IMAGE_SIZE[1], 3)
model_inception = tf.keras.applications.InceptionV3(input_shape=IMG_SHAPE, include_top = False, weights = 'imagenet')

model_inception.trainable = False # freeze the model

learning_rate = 0.001

model = tf.keras.Sequential([
    model_inception,
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate),
    loss = 'categorical_crossentropy',
    metrics=['accuracy']
)

EPOCHS = 10

history = model.fit(
    train_batches,
    epochs = EPOCHS
)

I'm not entirely sure if it's the way I preprocessed the data or the way I set up the model. Everything seems fine until I actually run the model. It seems like vanishing gradient that is happening but I don't know why and if it's actually the issue. I've looked up to see how the model is used but nothing seems to give an answer clearly.

Upvotes: 0

Views: 141

Answers (1)

mmiron
mmiron

Reputation: 164

So, you're outputting a single value that's in the domain of [0.0,1.0]. But you say that there are multiple classes, not just one class: so you'll want to make the number of neurons in your last layer however many classes there are, and change it from a sigmoid activation to a softmax activation. Like this:

NUM_CLASSES = 8   # edit this number to suit your problem

model = tf.keras.Sequential([
    model_inception,
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
])

sigmoid won't work properly, even though it will look like it is (because all the values are floats between 0 and 1, just like softmax will give you).

Frankly I'm not even sure how you aren't getting some sort of error when calling fit(). That's one of the problems with Keras, imho: it's so user friendly that it'll "work" (which is to say, run, not work) even when it should be giving you a hint about how you've set up your data improperly. Unless the training targets in your training set consist of a single float value for every input image, it should not be even running.

Upvotes: 1

Related Questions