Backpropagation for my own neural net to solve XOR not converging correctly

Question

For learning purposes I am implementing my own neural network from scratch in JavaScript and as a first task I want to solve the XOR problem. I can already solve OR and AND, but as soon as I need a hidden layer, my weights are not converging properly.

I use a 3 layer network with 2 input neurons +1 bias neuron, 1 hidden layer with 2 neurons +1 bias neuron and 1 output neuron.

This network architecture should definitely be able to solve the task. When I manually set the weights

let W1 = new Matrix([ // weights for mapping between layer 1 and layer 2
    [-10, 20, 20], // OR
    [30, -20, -20] // NAND
]);
let W2 = new Matrix([ // weights for mapping between layer 2 and layer 3
    [-30, 20, 20] // AND
]);

I get the correct output (very close to [0, 1, 1, 0]).

But when I try to learn the weights for the XOR problem, I always get an output close to [0.5, 0.5, 0.5, 0.5] instead of [0, 1, 1, 0]. I tried it with all sorts of different weight initializations, learning rates and number of gradient descent iterations, no improvement.

So I am pretty sure there is a mistake in my backpropagation algorithm (calculation of W1grad) but I just cannot find out what's wrong... Any help would be greatly appreciated!

// X inputs, W1, W2 = weights, y = outputs, alpha = learning rate
function gradientDescent(X, W1, W2, y, alpha, n_iterations) {
    for (let i = 0; i < n_iterations; i++) {
        // forward propagate
        let a1 = addBias(X); // addBias just adds a column of 1's at the front of the matrix
        let z2 = a1.times(W1.t()); // t() = transpose
        let a2 = addBias(z2.map(sigmoid));
        let z3 = a2.times(W2.t());
        let a3 = z3.map(sigmoid);

        // calculate error
        let error = logCost(a3, y);

        // back propagate
        let outputDelta = a3.minus(y);
        let hiddenDelta = outputDelta.times(W2).etimes(addBias(z2.map(sigmoidGradient))); // etimes is element-wise multiplication
        let W2grad = outputDelta.t().times(a2).timess(1 / X.h); // timess (with 2 s) is scalar multiplication. this gradient seems to be right!
        let W1grad = hiddenDelta.cols(1, hiddenDelta.w - 1).t().times(a1).timess(1 / X.h); // TODO this seems to be wrong...

        // update weights
        W1 = W1.minus(W1grad.timess(alpha));
        W2 = W2.minus(W2grad.timess(alpha));
    }
    return [W1, W2];
}

Full code can be found here (relevant parts at the bottom, output in the console): https://codepen.io/anon/pen/oqagqd

Patric · Accepted Answer

Turns out that it was the weight initializations after all!

For some reason my algorithm seems to be very sensitive to the initializations of the weights...

Using random values in the range between -2.5 and +2.5 and 5000+ gradient descent iterations mostly yield a correct solution to the XOR Problem. Many other ranges do not work at all...

Using

W1 = rand(2, 3).map(x => (x-.5)*5); // values between -2.5 and +2.5
W2 = rand(1, 3).map(x => (x-.5)*5);

Returns the output

0.0676236578905123
0.9425132775668613
0.9095288663122072
0.05522288831217417

Which is a satisfactory approximation to the XOR Problem (ground truth = [0, 1, 1, 0]).

And BTW: By adding more hidden neurons, it is much easier to get good results.

Backpropagation for my own neural net to solve XOR not converging correctly

Answers (1)

Related Questions