Train MNIST with AlexNet

Question

I am a beginner of Caffe. I had done the training of MNIST with LeNet and ImageNet with AlexNet followed by tutorial, and got pretty good results. Then I tried to train MNIST with AlexNet model. The train model is almost the same as models/bvlc_alexnet/train_val.prototxt but changed somewhere like:

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
 }
  transform_param {
    mirror: false` <--------------- set to false,  and delete crop_size and  mean_file 

  }
  data_param {
    source: "./mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}

......

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
 }
  transform_param {
    mirror: false  <-------- set to false,  and delete crop_size and  mean_file         
 }
  data_param {
    source: "./mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}

......

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size:3    <--------------------  changed to 3
    stride: 2            <--------------------  changed to 2
   weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
     value: 0
    }
  }
}

......

layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 10      <--------------------  changed to 10

 `   weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

and the solver.prototxt is as followed

net: "./train_val.prototxt"
test_iter: 1000
test_interval: 100
base_lr: 0.01
lr_policy: "inv"
power: 0.75
gamma: 0.1
stepsize: 1000
display: 100
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "./caffe_alexnet_train"
solver_mode: GPU

after 100,000 iteration training, the accuracy reached to about 0.97

I0315 19:28:54.827383 26505 solver.cpp:258]     Train net output #0: loss = 0.0331752 (* 1 = 0.0331752 loss)`

`......`

I0315 19:28:56.384718 26505 solver.cpp:351] Iteration 100000, Testing net (#0)
I0315 19:28:58.121800 26505 solver.cpp:418]     Test net output #0: accuracy = 0.974875
I0315 19:28:58.121834 26505 solver.cpp:418]     Test net output #1: loss = 0.0804802 (* 1 = 0.0804802 loss)

Then I used the python script to predict a single picture in test set

import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import caffe

caffe_root = '/home/ubuntu/pkg/local/caffe'

sys.path.insert(0, caffe_root + 'python')
MODEL_FILE = './deploy.prototxt' 

PRETRAINED = './caffe_alexnet_train_iter_100000.caffemodel'

IMAGE_FILE = './4307.png'

input_image = caffe.io.load_image(IMAGE_FILE, color=False)

net = caffe.Classifier(MODEL_FILE, PRETRAINED)

prediction = net.predict([input_image], oversample = False)

caffe.set_mode_cpu()

print( 'predicted class: ', prediction[0].argmax() )
print( 'predicted class all: ', prediction[0] )

but the prediction is wrong. (This script predicts well on MNIST with LeNet) and the probability of each class is odd also

predicted class:  9   <------------- the correct label is 5

predicted class all:  [0.01998338 0.14941786 0.09392905 0.07361069 0.07640345 0.10996494 0.03646726 0.12371133 0.15246753 0.16404454]

**the deploy.prototxt is the almost the same as models/bvlc_alexnet/deploy.prototxt but changed the same places in train_val.prototxt

Any suggestion?

Prune · Accepted Answer

AlexNet was designed to discriminate among 1000 classes, training on 1.3M input images of (canonically) 256x256x3 data values each. You're using essentially the same tool to handle 10 classes with 28x28x1 input.

Very simply, you're over-fitting by design.

If you want to use the general AlexNet design to handle the far-simpler job, you'll need to scale it down appropriately. It will take some experimentation to find a workable definition of "appropriately": narrow the conv layers by some factor, add a drop-out, cut out one conv inception entirely, ...

Train MNIST with AlexNet

Answers (1)

Related Questions