siby
siby

Reputation: 777

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.

#create mobilenet layer

MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)

# Must define the input shape in the first layer of the neural network

x = Input(shape=(32,32,3),name='input')

#Create custom model

model = MobileNet_model(x)

model = Flatten(name='flatten')(model)

model = Dense(1024, activation='relu',name='dense_1')(model)

output = Dense(10, activation=tf.nn.softmax,name='output')(model)

model_regular = Model(x, output,name='model_regular')

I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.

optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)

model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])

history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100)  # train the model

I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698 Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.

enter image description here

Upvotes: 1

Views: 3491

Answers (3)

Tên Ko
Tên Ko

Reputation: 1

Start training with:

python main.py

You can manually resume the training with:

python main.py --resume --lr=0.01

Upvotes: 0

MonsieurBeilto
MonsieurBeilto

Reputation: 938

The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -

1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).

2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.

3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.

4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Upvotes: 2

SoonYau
SoonYau

Reputation: 111

Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.

A few things to try to reduce overfitting:

  • add dropout before fully connected layer
  • data augmentation - random shift, crop and rotation should be enough
  • use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
  • use L2 weight regularization and weight decay

Then there are some usual training tricks:

  • use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially

With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.

Upvotes: 3

Related Questions