How can I tune my neural network to avoid overfitting the mnist data set?

Question

!!!!!!!!!TL;DR at the bottom!!!!!!!!

In an attempt to learn the in's and out's of ML, I have been implementing a neural network optimizer in c++ and wrapped it with swig as a python module. Of course, the first problem I tackled was XOR via the following snip of code: 2 input layers, 2 hidden layers, 1 output layer.

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import time

#=======================================================
# Training Set
#=======================================================

X = [[0,1],[1,0],[1,1],[0,0]]
Y = [[1],[1],[0],[0]]

nIn = len(X[0])
nOut = len(Y[0])

#=======================================================
# Model
#=======================================================
verbosity = 0

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,2,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize the classification optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X,Y)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();

#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10000,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#Make a prediction
print(Opt.predict(X))

This snippet of code yields the following output (Correct answer would be [1,1,0,0])

--- 0.10273098945617676 seconds ---
((0.9398755431175232,), (0.9397522211074829,), (0.0612373948097229,), (0.04882470518350601,))
>>>

Looks great! Now for the issue. The following snippet of code tries to learn from the mnist dataset, but suffers very obviously from overfitting. ~750 input (28X28 pixels), 50 hidden, 10 output

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import matplotlib.pyplot as plt
import numpy as np
import pickle
import time

#=======================================================
# Data Set
#=======================================================

#load the data dictionary
modeldata = pickle.load( open( "mnist_data.p", "rb" ) )
X = modeldata['X']
Y = modeldata['Y']

#normalize data
X = np.array(X)
X = X/255
X = X.tolist()

#training set
X1 = X[0:49999]
Y1 = Y[0:49999]

#validation set
X2 = X[50000:59999]
Y2 = Y[50000:59999]

#number of inputs/outputs
nIn = len(X[0]) #~750
nOut = len(Y[0]) #=10

#=======================================================
# Model
#=======================================================
verbosity = 1

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,50,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X1,Y1)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();
#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#================================
#Final Accuracy on training set
#================================
XL = Opt.predict(X1)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y1[i])):
        correct = correct + 1

print("Training set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy) + '%')

#================================
#Final Accuracy on validation set
#================================
XL = Opt.predict(X2)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y2[i])):
        correct = correct + 1

print("Testing set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy)+'%')

That snippet of code yields the following output which shows the training accuracy and validation accuracy.

-------------------------
Epoch
9
-------------------------
E=
0.00696964
E=
0.350509
E=
3.49568e-05
E=
4.09073e-06
E=
1.38491e-06
E=
0.229873
E=
3.60186e-05
E=
0.000115187
E=
2.29978e-06
E=
2.69165e-06
--- 27.400235176086426 seconds ---
Training set Correct = 48435
Accuracy = 96.87193743874877%
Testing set Correct = 982
Accuracy = 9.820982098209821%

The training set accuracy is great, but then the testing set is no better than a random guess. Any idea what could be causing this?

TL;DR

Solved XOR with a model 2 inputs, 2 hidden, 1 output and sigmoid activation functions. Good results.
Tried to solve the Mnist data set with a model of 750 inputs (28X28 pixels), 50 hidden, 10 output and sigmoid activation functions. Severe overfitting issue. 95% accuracy on the training set, 10% accuracy on validation set.

Any Idea what is causing this?

How can I tune my neural network to avoid overfitting the mnist data set?

!!!!!!!!!TL;DR at the bottom!!!!!!!!

TL;DR

Answers (1)

Related Questions