Virange
Virange

Reputation: 231

Tensorflow returns 10% validation accuracy for VGG model (irrespective of number of epochs)?

I am trying to train a neural network on CIFAR-10 using keras package in tensorflow. The neural network considered is VGG-16, which I directly borrowed from the official keras models. The definition is:

def cnn_model(nb_classes=10):
# VGG-16 official keras model
img_input= Input(shape=(32,32,3))
vgg_layer= Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
vgg_layer= Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(vgg_layer)
vgg_layer= MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(vgg_layer)

# Block 2
vgg_layer= Conv2D(64, (3, 3), activation='relu', padding='same', name='block2_conv1')(vgg_layer)
vgg_layer= Conv2D(64, (3, 3), activation='relu', padding='same', name='block2_conv2')(vgg_layer)
vgg_layer= MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(vgg_layer)

# Block 3
vgg_layer= Conv2D(128, (3, 3), activation='relu', padding='same', name='block3_conv1')(vgg_layer)
vgg_layer= Conv2D(128, (3, 3), activation='relu', padding='same', name='block3_conv2')(vgg_layer)
vgg_layer= Conv2D(128, (3, 3), activation='relu', padding='same', name='block3_conv3')(vgg_layer)
vgg_layer= MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(vgg_layer)

# Block 4
vgg_layer= Conv2D(256, (3, 3), activation='relu', padding='same', name='block4_conv1')(vgg_layer)
vgg_layer= Conv2D(256, (3, 3), activation='relu', padding='same', name='block4_conv2')(vgg_layer)
vgg_layer= Conv2D(256, (3, 3), activation='relu', padding='same', name='block4_conv3')(vgg_layer)
vgg_layer= MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(vgg_layer)

# Classification block
vgg_layer= Flatten(name='flatten')(vgg_layer)
vgg_layer= Dense(1024, activation='relu', name='fc1')(vgg_layer)
vgg_layer= Dense(1024, activation='relu', name='fc2')(vgg_layer)
vgg_layer= Dense(nb_classes, activation='softmax', name='predictions')(vgg_layer)

return Model(inputs=img_input, outputs=vgg_layer)

However during training, I always get both train and validation accuracy as 0.1 i.e, 10%.

validation accuracy for adv. training of model for epoch 1=  0.1
validation accuracy for adv. training of model for epoch 2=  0.1
validation accuracy for adv. training of model for epoch 3=  0.1
validation accuracy for adv. training of model for epoch 4=  0.1
validation accuracy for adv. training of model for epoch 5=  0.1

As a step towards debugging, whenever I replace with any other model (eg, any simple CNN model) it works perfectly well. This shows that the rest of the script works well.

For example the following CNN model works perfectly well and achieves an accuracy of 75% after 30 epochs.

def cnn_model(nb_classes=10, num_hidden=1024, weight_decay= 0.0001, cap_factor=4):
model=Sequential()
input_shape = (32,32,3)
model.add(Conv2D(32*cap_factor, kernel_size=(3,3), strides=(1,1), kernel_regularizer=keras.regularizers.l2(weight_decay), kernel_initializer="he_normal", activation='relu', padding='same', input_shape=input_shape))
model.add(Conv2D(32*cap_factor, kernel_size=(3,3), strides=(1,1), kernel_regularizer=keras.regularizers.l2(weight_decay), kernel_initializer="he_normal", activation="relu", padding="same"))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Conv2D(64*cap_factor, kernel_size=(3,3), strides=(1,1), kernel_regularizer=keras.regularizers.l2(weight_decay), kernel_initializer="he_normal", activation="relu", padding="same"))
model.add(Conv2D(64*cap_factor, kernel_size=(3,3), strides=(1,1), kernel_regularizer=keras.regularizers.l2(weight_decay), kernel_initializer="he_normal", activation="relu", padding="same"))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(num_hidden, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes, activation='softmax'))
return model

It appears to me that both of these models are correctly defined. However, one works perfect while the other doesn't learn at all. I also tried writing the VGG model as an Sequential structure i.e, similar to the second one, but it still gave me 10% accuracy.

Even if the model doesn't update any weights, still the "he_normal" initializer will easily able to obtain a much better accuracy than pure chance. It appears that somehow tensorflow computing the output logits from the model which results in accuracy as pure chance.

I will be really helpful if someone can point out my mistake in it.

Upvotes: 1

Views: 639

Answers (1)

azrev
azrev

Reputation: 127

Your 10% corresponds higly with nr of classes = 10. That makes me think that regardless of the training, your answer is always "1" for all categories, what constantly gives you 10% accuracy on 10 classes.

  1. Check the output of the untrained model, if it is always 1
  2. If so, check the initial weights of the model, probably it's wrongly initialized, gradients are zero and it can't converge

Upvotes: 2

Related Questions