Reputation: 3
In an attempt to learn the in's and out's of ML, I have been implementing a neural network optimizer in c++ and wrapped it with swig as a python module. Of course, the first problem I tackled was XOR via the following snip of code: 2 input layers, 2 hidden layers, 1 output layer.
from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import time
#=======================================================
# Training Set
#=======================================================
X = [[0,1],[1,0],[1,1],[0,0]]
Y = [[1],[1],[0],[0]]
nIn = len(X[0])
nOut = len(Y[0])
#=======================================================
# Model
#=======================================================
verbosity = 0
#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,2,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)
#Initialize the classification optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X,Y)
Opt.setLoggerVerbosity(verbosity)
start_time = time.time();
#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10000,0.1)
print("--- %s seconds ---" % (time.time() - start_time))
#Make a prediction
print(Opt.predict(X))
This snippet of code yields the following output (Correct answer would be [1,1,0,0])
--- 0.10273098945617676 seconds ---
((0.9398755431175232,), (0.9397522211074829,), (0.0612373948097229,), (0.04882470518350601,))
>>>
Looks great! Now for the issue. The following snippet of code tries to learn from the mnist dataset, but suffers very obviously from overfitting. ~750 input (28X28 pixels), 50 hidden, 10 output
from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import matplotlib.pyplot as plt
import numpy as np
import pickle
import time
#=======================================================
# Data Set
#=======================================================
#load the data dictionary
modeldata = pickle.load( open( "mnist_data.p", "rb" ) )
X = modeldata['X']
Y = modeldata['Y']
#normalize data
X = np.array(X)
X = X/255
X = X.tolist()
#training set
X1 = X[0:49999]
Y1 = Y[0:49999]
#validation set
X2 = X[50000:59999]
Y2 = Y[50000:59999]
#number of inputs/outputs
nIn = len(X[0]) #~750
nOut = len(Y[0]) #=10
#=======================================================
# Model
#=======================================================
verbosity = 1
#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,50,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)
#Initialize optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X1,Y1)
Opt.setLoggerVerbosity(verbosity)
start_time = time.time();
#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10,0.1)
print("--- %s seconds ---" % (time.time() - start_time))
#================================
#Final Accuracy on training set
#================================
XL = Opt.predict(X1)
correct = 0
for i,x in enumerate(XL):
if XL[i].index(max(XL[i])) == Y[i].index(max(Y1[i])):
correct = correct + 1
print("Training set Correct = " + str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy) + '%')
#================================
#Final Accuracy on validation set
#================================
XL = Opt.predict(X2)
correct = 0
for i,x in enumerate(XL):
if XL[i].index(max(XL[i])) == Y[i].index(max(Y2[i])):
correct = correct + 1
print("Testing set Correct = " + str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy)+'%')
That snippet of code yields the following output which shows the training accuracy and validation accuracy.
-------------------------
Epoch
9
-------------------------
E=
0.00696964
E=
0.350509
E=
3.49568e-05
E=
4.09073e-06
E=
1.38491e-06
E=
0.229873
E=
3.60186e-05
E=
0.000115187
E=
2.29978e-06
E=
2.69165e-06
--- 27.400235176086426 seconds ---
Training set Correct = 48435
Accuracy = 96.87193743874877%
Testing set Correct = 982
Accuracy = 9.820982098209821%
The training set accuracy is great, but then the testing set is no better than a random guess. Any idea what could be causing this?
Any Idea what is causing this?
Upvotes: 0
Views: 213
Reputation: 1497
The cause of overfitting is a combination of the data and model (network in this case). During the training is was 'lazy' and found aspects of the data that worked well in training data but not generalize well.
It is difficult/impossible to point out exactly where in the trained network the nodes/weights are located that are responsible for overfitting.
But we can avoid overfitting with several tricks:
https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
To get an idea of regularization, try the playground from tensorflow:
https://playground.tensorflow.org/
A visualisation of dropout
https://yusugomori.com/projects/deep-learning/dropout-relu
Besides try out regularisation techniques, also experiments with different NN architectures.
Upvotes: 1