Aditya
Aditya

Reputation: 1268

Simple neural network with linear layers not generating the expected output

I have been following the jeff heaton guide online, and I came to this point where I am trying to create a simple NN, it has three input neurons and one output neuron no hidden layer three weights associated with these three input neurons.

The neural network recognizes two binary combinations of 3 bits each from a total of 6 binary combinations.

Here's the code:

class neural{
    double weight1=1.0,weight2=1.0,weight3=1.0;
    double learningRate = 0.000001; 
    public double getOutput(double i1,double i2,double i3,double ideals){
        double u = weight1*i1 + weight2*i2 + weight3*i3;    
        double error = 0.0;     
        error = ideals -u;      
        weight1 += error * learningRate * i1;
        weight2 += error * learningRate * i2 ;
        weight3 += error * learningRate * i3 ;

        return u;
    }


}

public class pattern{
    public static void main(String argz[]){
        neural a = new neural();        
        for(int i = 0; i < 2000; i++){
            a.getOutput(0.0, 0.0, 0.0,0.0);
            a.getOutput(0.0, 0.0, 1.0,1.0);
            a.getOutput(0.0, 1.0, 0.0,1.0);
            a.getOutput(0.0, 1.0, 1.0,0.0);
            a.getOutput(1.0, 1.0, 0.0,0.0);
            a.getOutput(1.0, 1.0, 1.0,1.0);

        }

    }
}

I tried a learning rates as low as 0.000001 as pointed out by @Widdershins

Anything above 0.5 is 1 and anything below that is 0. so the outputs are 000101 instead of 011001

Upvotes: 1

Views: 565

Answers (2)

Mumbleskates
Mumbleskates

Reputation: 1338

So, let's get this sorted out in our heads.

u is the result you get with the inputs and the given weights.

ideals is the output you hope to achieve.

error is then the amount u went wrong; it should be the distance from u to ideals. That is, it should be ideals - u. That seems right.

Your learning value seems pretty high, though. Setting these values too high can cause oscillation instead of convergence, especially for highly regular inputs. Have you checked what your weight values look like between successive runs near the end of the learning loop? Have you tried lowering the learning rate?

Disclaimer: I'm not a neural network expert and you should consider any assertions I make to be conjecture, but this is my understanding.

Edit: I tried running your code with much smaller learning values (between 0.25 and 0.01) as few as 200 times and got the desired output. You should not need nearly twenty thousand loops for a network this simple, and remember to keep your learning rates low enough to avoid strange results: with about 200 learning loops, the network will start outputting the incorrect 000101 instead of 001010 as soon as the learning rate hits a critical value of about 0.7. Lower learning rates, even VERY low rates, result in much better results.


Now that we are looking into sigmoid functions:

import java.util.Random;
import java.util.Arrays;

public class NeuralNet {
  static final Random rand = new Random();


  static final double[][] teach = new double[][]
  { {0d, 0d, 0d, 0d},
    {0d, 0d, 1d, 0d},
    {0d, 1d, 0d, 1d},
    {0d, 1d, 1d, 0d},
    {1d, 1d, 0d, 1d},
    {1d, 1d, 1d, 0d} };


  public static void main(String[] args) {
    Neural a = new Neural();        
    for(int i = 0; i < 2000; i++){
      int t = rand.nextInt(teach.length);
      a.learn(teach[t][0], teach[t][1], teach[t][2], teach[t][3]);
    }

    System.out.println(a);
    for (int t = 0; t < teach.length; t++) {
      System.out.println(a.react(teach[t][0], teach[t][1], teach[t][2]));
    }
  }

  public static double sigmoid(double u) {
    return 1 / (1 + Math.exp(-u));
  }

  static class Neural {
    static final double INIT_WEIGHT_RANGE = 1 / Math.sqrt(3);
    final double LEARNING_RATE = 0.1;

    double offset = (rand.nextDouble() * 2 - 1) * INIT_WEIGHT_RANGE,
      weight1 = (rand.nextDouble() * 2 - 1) * INIT_WEIGHT_RANGE,
      weight2 = (rand.nextDouble() * 2 - 1) * INIT_WEIGHT_RANGE,
      weight3 = (rand.nextDouble() * 2 - 1) * INIT_WEIGHT_RANGE;

    public double learn(double i1, double i2, double i3, double ideals) {
      double u =
        offset +
        weight1 * i1 +
        weight2 * i2 +
        weight3 * i3;
      u = sigmoid(u);
      double correction = (ideals - u) * LEARNING_RATE;

      offset += correction;
      weight1 += correction * i1;
      weight2 += correction * i2;
      weight3 += correction * i3;

      return u;
    }

    public double react(double i1, double i2, double i3) {
      double u =
        offset +
        weight1 * i1 +
        weight2 * i2 +
        weight3 * i3;
      return sigmoid(u);
    }

    public String toString() {
      // how lazy!
      return Arrays.toString(new double[] {offset, weight1, weight2, weight3});
    }
  }
}

I've done a fair bit of reading just now on what kind of backpropagation function we should have, but just leaving it linear like this seems to work just great. For all I can tell that may well be right. With enough epochs this will pretty much learn any values from 0 to 1.

Upvotes: 2

Jules
Jules

Reputation: 15199

Your training patterns with ideal output 0 and ideal output 1 are not linearly separable, which means that the expected output you're trying to get cannot be learned by a network with no hidden layer. Notice particularly that the outputs you're asking for when i1=0 are equivalent to the well-known xor problem. See an explanation of this here.

Upvotes: 1

Related Questions