Michael Steijlen
Michael Steijlen

Reputation: 3

How can I tune my neural network to avoid overfitting the mnist data set?

!!!!!!!!!TL;DR at the bottom!!!!!!!!

In an attempt to learn the in's and out's of ML, I have been implementing a neural network optimizer in c++ and wrapped it with swig as a python module. Of course, the first problem I tackled was XOR via the following snip of code: 2 input layers, 2 hidden layers, 1 output layer.

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import time

#=======================================================
# Training Set
#=======================================================

X = [[0,1],[1,0],[1,1],[0,0]]
Y = [[1],[1],[0],[0]]

nIn = len(X[0])
nOut = len(Y[0])

#=======================================================
# Model
#=======================================================
verbosity = 0

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,2,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize the classification optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X,Y)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();

#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10000,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#Make a prediction
print(Opt.predict(X))

This snippet of code yields the following output (Correct answer would be [1,1,0,0])

--- 0.10273098945617676 seconds ---
((0.9398755431175232,), (0.9397522211074829,), (0.0612373948097229,), (0.04882470518350601,))
>>>

Looks great! Now for the issue. The following snippet of code tries to learn from the mnist dataset, but suffers very obviously from overfitting. ~750 input (28X28 pixels), 50 hidden, 10 output

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import matplotlib.pyplot as plt
import numpy as np
import pickle
import time

#=======================================================
# Data Set
#=======================================================

#load the data dictionary
modeldata = pickle.load( open( "mnist_data.p", "rb" ) )
X = modeldata['X']
Y = modeldata['Y']

#normalize data
X = np.array(X)
X = X/255
X = X.tolist()

#training set
X1 = X[0:49999]
Y1 = Y[0:49999]

#validation set
X2 = X[50000:59999]
Y2 = Y[50000:59999]

#number of inputs/outputs
nIn = len(X[0]) #~750
nOut = len(Y[0]) #=10

#=======================================================
# Model
#=======================================================
verbosity = 1

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,50,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X1,Y1)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();
#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#================================
#Final Accuracy on training set
#================================
XL = Opt.predict(X1)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y1[i])):
        correct = correct + 1

print("Training set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy) + '%')

#================================
#Final Accuracy on validation set
#================================
XL = Opt.predict(X2)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y2[i])):
        correct = correct + 1

print("Testing set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy)+'%')

That snippet of code yields the following output which shows the training accuracy and validation accuracy.

-------------------------
Epoch
9
-------------------------
E=
0.00696964
E=
0.350509
E=
3.49568e-05
E=
4.09073e-06
E=
1.38491e-06
E=
0.229873
E=
3.60186e-05
E=
0.000115187
E=
2.29978e-06
E=
2.69165e-06
--- 27.400235176086426 seconds ---
Training set Correct = 48435
Accuracy = 96.87193743874877%
Testing set Correct = 982
Accuracy = 9.820982098209821%

The training set accuracy is great, but then the testing set is no better than a random guess. Any idea what could be causing this?

TL;DR

  1. Solved XOR with a model 2 inputs, 2 hidden, 1 output and sigmoid activation functions. Good results.
  2. Tried to solve the Mnist data set with a model of 750 inputs (28X28 pixels), 50 hidden, 10 output and sigmoid activation functions. Severe overfitting issue. 95% accuracy on the training set, 10% accuracy on validation set.

Any Idea what is causing this?

Upvotes: 0

Views: 213

Answers (1)

Willem Hendriks
Willem Hendriks

Reputation: 1497

The cause of overfitting is a combination of the data and model (network in this case). During the training is was 'lazy' and found aspects of the data that worked well in training data but not generalize well.

It is difficult/impossible to point out exactly where in the trained network the nodes/weights are located that are responsible for overfitting.

But we can avoid overfitting with several tricks:

  1. Regularisation
  2. Drop-out (easier to implement)
  3. Change Network Architecture (less layers/less nodes/more dimension-reduction)

https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/

To get an idea of regularization, try the playground from tensorflow:

https://playground.tensorflow.org/

A visualisation of dropout

https://yusugomori.com/projects/deep-learning/dropout-relu

Besides try out regularisation techniques, also experiments with different NN architectures.

Upvotes: 1

Related Questions