Reputation: 1627
For learning purposes I am implementing my own neural network from scratch in JavaScript and as a first task I want to solve the XOR problem. I can already solve OR and AND, but as soon as I need a hidden layer, my weights are not converging properly.
I use a 3 layer network with 2 input neurons +1 bias neuron, 1 hidden layer with 2 neurons +1 bias neuron and 1 output neuron.
This network architecture should definitely be able to solve the task. When I manually set the weights
let W1 = new Matrix([ // weights for mapping between layer 1 and layer 2
[-10, 20, 20], // OR
[30, -20, -20] // NAND
]);
let W2 = new Matrix([ // weights for mapping between layer 2 and layer 3
[-30, 20, 20] // AND
]);
I get the correct output (very close to [0, 1, 1, 0]
).
But when I try to learn the weights for the XOR problem, I always get an output close to [0.5, 0.5, 0.5, 0.5]
instead of [0, 1, 1, 0]
. I tried it with all sorts of different weight initializations, learning rates and number of gradient descent iterations, no improvement.
So I am pretty sure there is a mistake in my backpropagation algorithm (calculation of W1grad) but I just cannot find out what's wrong... Any help would be greatly appreciated!
// X inputs, W1, W2 = weights, y = outputs, alpha = learning rate
function gradientDescent(X, W1, W2, y, alpha, n_iterations) {
for (let i = 0; i < n_iterations; i++) {
// forward propagate
let a1 = addBias(X); // addBias just adds a column of 1's at the front of the matrix
let z2 = a1.times(W1.t()); // t() = transpose
let a2 = addBias(z2.map(sigmoid));
let z3 = a2.times(W2.t());
let a3 = z3.map(sigmoid);
// calculate error
let error = logCost(a3, y);
// back propagate
let outputDelta = a3.minus(y);
let hiddenDelta = outputDelta.times(W2).etimes(addBias(z2.map(sigmoidGradient))); // etimes is element-wise multiplication
let W2grad = outputDelta.t().times(a2).timess(1 / X.h); // timess (with 2 s) is scalar multiplication. this gradient seems to be right!
let W1grad = hiddenDelta.cols(1, hiddenDelta.w - 1).t().times(a1).timess(1 / X.h); // TODO this seems to be wrong...
// update weights
W1 = W1.minus(W1grad.timess(alpha));
W2 = W2.minus(W2grad.timess(alpha));
}
return [W1, W2];
}
Full code can be found here (relevant parts at the bottom, output in the console): https://codepen.io/anon/pen/oqagqd
Upvotes: 3
Views: 356
Reputation: 1627
Turns out that it was the weight initializations after all!
For some reason my algorithm seems to be very sensitive to the initializations of the weights...
Using random values in the range between -2.5 and +2.5 and 5000+ gradient descent iterations mostly yield a correct solution to the XOR Problem. Many other ranges do not work at all...
Using
W1 = rand(2, 3).map(x => (x-.5)*5); // values between -2.5 and +2.5
W2 = rand(1, 3).map(x => (x-.5)*5);
Returns the output
0.0676236578905123
0.9425132775668613
0.9095288663122072
0.05522288831217417
Which is a satisfactory approximation to the XOR Problem (ground truth = [0, 1, 1, 0]
).
And BTW: By adding more hidden neurons, it is much easier to get good results.
Upvotes: 2