Reputation: 567
I am a beginner of Caffe. I had done the training of MNIST with LeNet and ImageNet with AlexNet followed by tutorial, and got pretty good results. Then I tried to train MNIST with AlexNet model. The train model is almost the same as models/bvlc_alexnet/train_val.prototxt
but changed somewhere like:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false` <--------------- set to false, and delete crop_size and mean_file
}
data_param {
source: "./mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
......
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false <-------- set to false, and delete crop_size and mean_file
}
data_param {
source: "./mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
......
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size:3 <-------------------- changed to 3
stride: 2 <-------------------- changed to 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
......
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 10 <-------------------- changed to 10
` weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
and the solver.prototxt is as followed
net: "./train_val.prototxt"
test_iter: 1000
test_interval: 100
base_lr: 0.01
lr_policy: "inv"
power: 0.75
gamma: 0.1
stepsize: 1000
display: 100
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "./caffe_alexnet_train"
solver_mode: GPU
after 100,000 iteration training, the accuracy reached to about 0.97
I0315 19:28:54.827383 26505 solver.cpp:258] Train net output #0: loss = 0.0331752 (* 1 = 0.0331752 loss)`
`......`
I0315 19:28:56.384718 26505 solver.cpp:351] Iteration 100000, Testing net (#0)
I0315 19:28:58.121800 26505 solver.cpp:418] Test net output #0: accuracy = 0.974875
I0315 19:28:58.121834 26505 solver.cpp:418] Test net output #1: loss = 0.0804802 (* 1 = 0.0804802 loss)
Then I used the python script to predict a single picture in test set
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import caffe
caffe_root = '/home/ubuntu/pkg/local/caffe'
sys.path.insert(0, caffe_root + 'python')
MODEL_FILE = './deploy.prototxt'
PRETRAINED = './caffe_alexnet_train_iter_100000.caffemodel'
IMAGE_FILE = './4307.png'
input_image = caffe.io.load_image(IMAGE_FILE, color=False)
net = caffe.Classifier(MODEL_FILE, PRETRAINED)
prediction = net.predict([input_image], oversample = False)
caffe.set_mode_cpu()
print( 'predicted class: ', prediction[0].argmax() )
print( 'predicted class all: ', prediction[0] )
but the prediction is wrong. (This script predicts well on MNIST with LeNet) and the probability of each class is odd also
predicted class: 9 <------------- the correct label is 5
predicted class all: [0.01998338 0.14941786 0.09392905 0.07361069 0.07640345 0.10996494 0.03646726 0.12371133 0.15246753 0.16404454]
**the deploy.prototxt is the almost the same as models/bvlc_alexnet/deploy.prototxt
but changed the same places in train_val.prototxt
Any suggestion?
Upvotes: 0
Views: 1131
Reputation: 77880
AlexNet was designed to discriminate among 1000 classes, training on 1.3M input images of (canonically) 256x256x3 data values each. You're using essentially the same tool to handle 10 classes with 28x28x1 input.
Very simply, you're over-fitting by design.
If you want to use the general AlexNet design to handle the far-simpler job, you'll need to scale it down appropriately. It will take some experimentation to find a workable definition of "appropriately": narrow the conv layers by some factor, add a drop-out, cut out one conv inception entirely, ...
Upvotes: 0