Reputation: 4545
I have implemented a neural network class that always has just a single hidden layer, using no libraries - not even numpy. I have done everything such the way that I understood it should be, but it is not learning at all, the loss is actually continuously increasing and I cannot find where I have gone wrong, even after looking at many examples online.
Here is my MLP class and a demo of it attempting to learn the XOR function:
import random
from math import exp
class MLP:
def __init__(self, numInputs, numHidden, numOutputs):
# MLP architecture sizes
self.numInputs = numInputs
self.numHidden = numHidden
self.numOutputs = numOutputs
# MLP weights
self.IH_weights = [[random.random() for i in range(numHidden)] for j in range(numInputs)]
self.HO_weights = [[random.random() for i in range(numOutputs)] for j in range(numHidden)]
# Gradients corresponding to weight matrices computed during backprop
self.IH_gradients = [[0 for i in range(numHidden)] for j in range(numInputs)]
self.HO_gradients = [[0 for i in range(numOutputs)] for j in range(numHidden)]
# Input, hidden and output neuron values
self.I = None
self.H = [0 for i in range(numHidden)]
self.O = [0 for i in range(numOutputs)]
self.H_deltas = [0 for i in range(numHidden)]
self.O_deltas = [0 for i in range(numOutputs)]
# Sigmoid
def activation(self, x):
return 1 / (1 + exp(-x))
# Derivative of Sigmoid
def activationDerivative(self, x):
return x * (1 - x)
# Squared Error
def calculateError(self, prediction, label):
return (prediction - label) ** 2
def forward(self, input):
self.I = input
for i in range(self.numHidden):
for j in range(self.numInputs):
self.H[i] += self.I[j] * self.IH_weights[j][i]
self.H[i] = self.activation(self.H[i])
for i in range(self.numOutputs):
for j in range(self.numHidden):
self.O[i] += self.activation(self.H[j] * self.HO_weights[j][i])
self.O[i] = self.activation(self.O[i])
return self.O
def backwards(self, label):
if label != list:
label = [label]
error = 0
for i in range(self.numOutputs):
neuronError = self.calculateError(self.O[i], label[i])
error += neuronError
self.O_deltas[i] = neuronError * self.activationDerivative(self.O[i])
for j in range(self.numHidden):
self.HO_gradients[j][i] += self.O_deltas[i] * self.H[j]
for i in range(self.numHidden):
neuronError = 0
for j in range(self.numOutputs):
neuronError += self.HO_weights[i][j] * self.O_deltas[j]
self.H_deltas[i] = neuronError * self.activationDerivative(self.H[i])
for j in range(self.numInputs):
self.IH_gradients[j][i] += self.H_deltas[i] * self.I[j]
return error
def updateWeights(self, learningRate):
for i in range(self.numInputs):
for j in range(self.numHidden):
self.IH_weights[i][j] += learningRate * self.IH_gradients[i][j]
for i in range(self.numHidden):
for j in range(self.numOutputs):
self.HO_weights[i][j] += learningRate * self.HO_gradients[i][j]
self.IH_gradients = [[0 for i in range(self.numHidden)] for j in range(self.numInputs)]
self.HO_gradients = [[0 for i in range(self.numOutputs)] for j in range(self.numHidden)]
data = [
[[0, 0], 0],
[[0, 1], 1],
[[1, 0], 1],
[[1, 1], 0]
]
mlp = MLP(2, 5, 1)
for epoch in range(100):
epochError = 0
for i in range(len(data)):
mlp.forward(data[i][0])
epochError += mlp.backwards(data[i][1])
print(epochError / len(data))
mlp.updateWeights(0.001)
Upvotes: 3
Views: 214
Reputation: 11
How did you go with this? I showed it to a friend - we both found your goal of doing the algorithm without much abstraction was edifying, although trying to find errors is difficult.
The improvement he found is that updateWeights needs to be a negative feedback loop, so change "+=" to "-=" in two lines giving:
self.IH_weights[i][j] -= learningRate * self.IH_gradients[i][j]
and
self.HO_weights[i][j] -= learningRate * self.HO_gradients[i][j]
The other factor is increasing the learning rate. With these changes, the error descends to about 16% (for me, I may have made another change that I am not seeing) before it begins to climb asymptoting to 27% - maybe due to overtraining with a learning rate that is too high.
I made the learning rate dependent on the epoch
mlp.updateWeights(0.1/(0.01 * (epoch+1)))
and its decreases steadily and stabilizes at 0.161490...
But if you get the prediction from 'forward', its always predicting 0.66 - the inputs have been wiped away. So... that's bad.
- Input Data: [0, 0] | Prediction: [0.6610834017294481] |Truth: 0
- Input Data: [0, 1] | Prediction: [0.6616502691118376] |Truth: 1
- Input Data: [1, 0] | Prediction: [0.6601936411430607] |Truth: 1
- Input Data: [1, 1] | Prediction: [0.6596122207209283] |Truth: 0
Upvotes: 1
Reputation: 54
If I understood your implementation correctly, then your problem I believe is in the calculation of the weight updates in the backwards function, the update should be the error (not error squared) multiplied by the sigmoid derivative, so I would take a look/redo the calculations.
Upvotes: 1