williamg
williamg

Reputation: 2758

Neural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAttern

Overview

So I'm trying to get a grasp on the mechanics of neural networks. I still don't totally grasp the math behind it, but I think I understand how to implement it. I currently have a neural net that can learn AND, OR, and NOR training patterns. However, I can't seem to get it to implement the XOR pattern. My feed forward neural network consists of 2 inputs, 3 hidden, and 1 output. The weights and biases are randomly set between -0.5 and 0.5, and outputs are generated with the sigmoidal activation function

Algorithm

So far, I'm guessing I made a mistake in my training algorithm which is described below:

  1. For each neuron in the output layer, provide an error value that is the desiredOutput - actualOutput --go to step 3
  2. For each neuron in a hidden or input layer (working backwards) provide an error value that is the sum of all forward connection weights * the errorGradient of the neuron at the other end of the connection --go to step 3
  3. For each neuron, using the error value provided, generate an error gradient that equals output * (1-output) * error. --go to step 4
  4. For each neuron, adjust the bias to equal current bias + LEARNING_RATE * errorGradient. Then adjust each backward connection's weight to equal current weight + LEARNING_RATE * output of neuron at other end of connection * this neuron's errorGradient

I'm training my neural net online, so this runs after each training sample.

Code

This is the main code that runs the neural network:

private void simulate(double maximumError) {

    int errorRepeatCount = 0;
    double prevError = 0;

    double error; // summed squares of errors
    int trialCount = 0;

    do {

        error = 0;

        // loop through each training set
        for(int index = 0; index < Parameters.INPUT_TRAINING_SET.length; index++) {

            double[] currentInput = Parameters.INPUT_TRAINING_SET[index];
            double[] expectedOutput = Parameters.OUTPUT_TRAINING_SET[index];
            double[] output = getOutput(currentInput);

            train(expectedOutput);

            // Subtracts the expected and actual outputs, gets the average of those outputs, and then squares it.
            error += Math.pow(getAverage(subtractArray(output, expectedOutput)), 2); 



        }

    } while(error > maximumError);

Now the train() function:

public void train(double[] expected) {

    layers.outputLayer().calculateErrors(expected);

    for(int i = Parameters.NUM_HIDDEN_LAYERS; i >= 0; i--) {
        layers.allLayers[i].calculateErrors();
    }

}

Output layer calculateErrors() function:

public void calculateErrors(double[] expectedOutput) {

    for(int i = 0; i < numNeurons; i++) {

        Neuron neuron = neurons[i];
        double error = expectedOutput[i] - neuron.getOutput();
        neuron.train(error);

    }

}

Normal (Hidden & Input) layer calculateErrors() function:

public void calculateErrors() {

    for(int i = 0; i < neurons.length; i++) {

        Neuron neuron = neurons[i];

        double error = 0;

        for(Connection connection : neuron.forwardConnections) {

            error += connection.output.errorGradient * connection.weight;

        }

        neuron.train(error);

    }

}

Full Neuron class:

package neuralNet.layers.neurons;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

import neuralNet.Parameters;
import neuralNet.layers.NeuronLayer;

public class Neuron {

private double output, bias;
public List<Connection> forwardConnections = new ArrayList<Connection>(); // Forward = layer closer to input -> layer closer to output
public List<Connection> backwardConnections = new ArrayList<Connection>(); // Backward = layer closer to output -> layer closer to input

public double errorGradient;
public Neuron() {

    Random random = new Random();
    bias = random.nextDouble() - 0.5;

}

public void addConnections(NeuronLayer prevLayer) {

    // This is true for input layers. They create their connections differently. (See InputLayer class)
    if(prevLayer == null) return;

    for(Neuron neuron : prevLayer.neurons) {

        Connection.createConnection(neuron, this);

    }

}

public void calcOutput() {

    output = bias;

    for(Connection connection : backwardConnections) {

        connection.input.calcOutput();
        output += connection.input.getOutput() * connection.weight;

    }

    output = sigmoid(output);

}

private double sigmoid(double output) {
    return 1 / (1 + Math.exp(-1*output));
}

public double getOutput() {
    return output;
}

public void train(double error) {

    this.errorGradient = output * (1-output) * error;

    bias += Parameters.LEARNING_RATE * errorGradient;

    for(Connection connection : backwardConnections) {

        // for clarification: connection.input refers to a neuron that outputs to this neuron
        connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;

    }

}

}

Results

When I'm training for AND, OR, or NOR the network can usually converge within about 1000 epochs, however when I train with XOR, the outputs become fixed and it never converges. So, what am I doing wrong? Any ideas?

Edit

Following the advice of others, I started over and implemented my neural network without classes...and it works. I'm still not sure where my problem lies in the above code, but it's in there somewhere.

Upvotes: 12

Views: 9197

Answers (8)

user1539778
user1539778

Reputation: 61

I couldn't see anything wrong with the code, but I was having a similar problem with my network not converging for XOR, so figured I'd post my working configuration.

3 input neurons (one of them being a fixed bias of 1.0)
3 hidden neurons
1 output neuron

Weights randomly chosen between -0.5 and 0.5.
Sigmoid activation function.

Learning rate = 0.2
Momentum = 0.4
Epochs = 50,000

Converged 10/10 times.

One of the mistakes I was making was not connecting the bias input to the output neuron, and this would mean for the same configuration it only converged 2 out of 10 times with the other eight times failing because 1 and 1 would output 0.5.

Another mistake was not doing enough epochs. If I only did 1000 then the outputs tend to be around 0.5 for every test case. With epochs >= 8000 so 2000 times for each test case, it started to look like it might be working (but only if using momentum).

When doing 50000 epochs it did not matter whether momentum was used or not.

Another thing I tried was to not apply the sigmoid function to the output neurons output (which I think was what an earlier post had suggested), but this wrecked the network because the output*(1-output) part of the error equation could now be negative meaning weights were updated in a way that caused the error to increase.

Upvotes: 0

Janek Olszak
Janek Olszak

Reputation: 4303

Small hint - if the output of your NN seem to drift toward 0.5 then everything's OK!

The algorithm using just the learning rate and bias is just too simple to quickly learn XOR. You can either increase the number of epochs or change the algorithm.

My recommendation is to use momentum:

  • 1000 epochs
  • learningRate = 0.3
  • momentum = 0.8
  • weights drawn from [0,1]
  • bias drawn form [-0.5, 0.5]

And some crucial pseudo code (assuming back and forward propagation works) :

for every edge:
    previous_edge_weight_change = -1 * learningRate * edge_source_neuron_value * edge_target_neuron_delta + previous_edge_weight * momentum

    edge_weight += previous_edge_weight_change

for every neuron:
    previous_neuron_bias_change = -1 * learningRate * neuron_delta + previous_neuron_bias_change * momentum

    bias += previous_neuron_bias_change

Upvotes: 3

mrbo
mrbo

Reputation: 3437

I faced the same problem short time ago. Finally I found the solution, how to write a code solving XOR wit the MLP algorithm.

The XOR problem seems to be an easy to learn problem but it isn't for the MLP because it's not linearly separable. So even if your MLP is OK (I mean there is no bug in your code) you have to find the good parameters to be able to learn the XOR problem.

Two hidden and one output neuron is fine. The 2 main thing you have to set:

  • although you have only 4 training samples you have to run the training for a couple of thousands epoch.
  • if you use sigmoid hidden layers but linear output the network will converge faster

Here is the detailed description and sample code: http://freeconnection.blogspot.hu/2012/09/solving-xor-with-mlp.html

Upvotes: 5

williamg
williamg

Reputation: 2758

LiKao's comment to simplify my implementation and get rid of the object-oriented aspects solved my problem. The flaw in the algorithm as it is described above is unknown, however I now have a working neural network that is a lot smaller.

Feel free to continue to provide insight on the problem with my previous implementation, as others may have the same problem in the future.

Upvotes: 1

LiKao
LiKao

Reputation: 10658

It's been a while since I last implemented an Neural Network myself, but I think your mistake is in the lines:

bias += Parameters.LEARNING_RATE * errorGradient;

and

connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;

The first of these lines should not be there at all. Bias is best modeled as the input of a neuron which is fixed at 1. This will serve to make your code a lot simpler and cleaner, because you will not have to treat the bias in any special way.

The other point is, that I think the sign is wrong in both of these expressions. Think about it like this:

  1. Your gradient points into the direction of steepest ascend, so if you go into that direction, your error will get larger.

  2. What you are doing here is adding something to the weights, in case the error is already positive, i.e. you are making it more positive. If it is negative you are substracting someting, i.e. you make it more negative.

Unless I am missing something about your definition of error or the gradient calculation you should change these lines to:

bias -= Parameters.LEARNING_RATE * errorGradient;

and

connection.weight -= Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;

I had a similar mistake in one of my early implementations and it lead to exactly the same behaviour, i.e. it resulted in a network that learned in simple cases, but not anymore once the training data became more complex.

Upvotes: 1

Philip JF
Philip JF

Reputation: 28539

This is surprising because you are using a big enough network (barely) to learn XOR. Your algorithm looks right, so I dont really know what is going on. It might help to know how you generate your training data: are you just reating the samples (1,0,1),(1,1,0),(0,1,1),(0,0,0) or something like that over and over? Perhaps the problem is that stochastic gradient descent is causing you to jump around the stabilizing minima. You could try some things to fix this: perhaps randomly sample from your training examples instead of repeating them (if that is what you are doing). Or, alternatively, you could modify your learning algorithm:

currently you have something equivalent to:

weight(epoch) = weight(epoch - 1) + deltaWeight(epoch)
deltaWeight(epoch) = mu * errorGradient(epoch)

where mu is the learning rate

One option is to very slowly decrease the value of mu.

An alternative would be to change your definition of deltaWeight to include a "momentum"

deltaWeight(epoch) = mu * errorGradient(epoch) + alpha * deltaWeight(epoch -1)

where alpha is the momentum parameter (between 0 and 1).

Visually, you can think of gradient descent as trying to find the minimum point of a curved surface by placing an object on that surface, and then step by step moving that object small amounts in which ever directing is sloping down based on where it is currently located. The problem is that you dont really do gradient descent: instead you do stochastic gradient descent where you move in direction by sampling from a set of training vectors and moving in what ever direction the sample makes look like is down. On average over the entire training data, stochastic gradient descent should work, but it is isn't guaranteed to because you can get into a situation where you jump back and forth never making progress. Slowly decreasing the learning rate means you take smaller and smaller steps each time so can not get stuck in an infinite cycle.

On the other hand, momentum makes the algorithm into something akin to rolling a rubber ball. As the ball roles it tends to go in the downwards direction, but it also tends to keep going in the direction it was going before, and if it is ever on a stretch where the down slope is in the same direction for a while it will speed up. The ball will therefore jump over some local minima, and it will be more resilient against stepping back and forth over the target because doing so means working against the force of momentum.


Having some code and thinking about this some more, it is pretty clear that your problem is in training the early layers. The functions you have successfully learned are all linearly separable, so it would make sense that only a single layer is being properly learned. I agree with LiKao about implementation strategies in general, although your approach should work. My suggestion for how to debug this is figure out what the progression of the weights on the connections between the input layer and the output layer looks like.

You should post the rest implementation of Neuron.

Upvotes: 7

Emmanuel Hadoux
Emmanuel Hadoux

Reputation: 70

I would suggest you to generate a grid (say from [-5,-5] to [5,5] with a step like 0.5), learn your MLP on the XOR and apply it to the grid. Plotted in color you could see some kind of a frontier. If you do that at each iteration, you'll see the evolution of the frontier and can control the learning.

Upvotes: 1

enzom83
enzom83

Reputation: 8310

I'm a bit rusty on neural networks, but I think there was a problem to implement the XOR with one perceptron: basically a neuron is able to separate two groups of solutions through a straight line, but one straight line is not sufficient for the XOR problem...

Here should be the answer!

Upvotes: 0

Related Questions