Macko
Macko

Reputation: 976

Teaching neural network Xor function

I want to teach my neural network xor with 3 layers: 1. Input layer with 2 neurons fully connected to next single hidden layer with 2 neurons 3rd ouput layer with single ouput neuron. I'm going to use sigmoid activation function and gradient descent.

My questions are: 1. How should be the stop function formulated: i know that we can check number of itarations or check if error is less than some acceptable error but how this error should be calculated? What is the formula ? Only error calculated on ouput layer? During single pass of teaching one sample? 2. Can be bias value less than 1 but greater than 0? Some descriptions tells that it should always be 1 but others it can be random number from this range.

Upvotes: 1

Views: 441

Answers (1)

Anand C U
Anand C U

Reputation: 915

Here is a one hidden layer network with backpropagation which can be customized to run experiments with relu, sigmoid and other activations. After several experiments it was concluded that with relu the network performed better and reached convergence sooner, while with sigmoid the loss value fluctuated. This happens because, "the gradient of sigmoids becomes increasingly small as the absolute value of x increases".

import numpy as np
import matplotlib.pyplot as plt
from operator import xor

class neuralNetwork():
    def __init__(self):
        # Define hyperparameters
        self.noOfInputLayers = 2
        self.noOfOutputLayers = 1
        self.noOfHiddenLayerNeurons = 2

        # Define weights
        self.W1 = np.random.rand(self.noOfInputLayers,self.noOfHiddenLayerNeurons)
        self.W2 = np.random.rand(self.noOfHiddenLayerNeurons,self.noOfOutputLayers)

    def relu(self,z):
        return np.maximum(0,z)

    def sigmoid(self,z):
        return 1/(1+np.exp(-z))

    def forward (self,X):
        self.z2 = np.dot(X,self.W1)
        self.a2 = self.relu(self.z2)
        self.z3 = np.dot(self.a2,self.W2)
        yHat = self.relu(self.z3)
        return yHat

    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J

    def costFunctionPrime(self,X,y):
        # Compute derivative with respect to W1 and W2
        delta3 = np.multiply(-(y-self.yHat),self.sigmoid(self.z3))
        djw2 = np.dot(self.a2.T, delta3)
        delta2 = np.dot(delta3,self.W2.T)*self.sigmoid(self.z2)
        djw1 = np.dot(X.T,delta2)

        return djw1,djw2


if __name__ == "__main__":

    EPOCHS = 6000
    SCALAR = 0.01

    nn= neuralNetwork()    
    COST_LIST = []

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]

    for epoch in xrange(1,EPOCHS):
        cost = 0
        for i in inputs:
            X = i #inputs
            y = xor(X[0][0],X[0][1])
            cost += nn.costFunction(X,y)[0]
            djw1,djw2 = nn.costFunctionPrime(X,y)
            nn.W1 = nn.W1 - SCALAR*djw1
            nn.W2 = nn.W2 - SCALAR*djw2
        COST_LIST.append(cost)

    plt.plot(np.arange(1,EPOCHS),COST_LIST)
    plt.ylim(0,1)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title(str('Epochs: '+str(EPOCHS)+', Scalar: '+str(SCALAR)))
    plt.show()

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
    print "X\ty\ty_hat"
    for inp in inputs:
        print (inp[0][0],inp[0][1]),"\t",xor(inp[0][0],inp[0][1]),"\t",round(nn.forward(inp)[0][0],4)

End Result:

enter image description here

X       y       y_hat
(0, 0)  0       0.0
(0, 1)  1       0.9997
(1, 0)  1       0.9997
(1, 1)  0       0.0005

The weights obtained after training were:

nn.w1

[ [-0.81781753  0.71323677]
  [ 0.48803631 -0.71286155] ]

nn.w2

[ [ 2.04849235]
  [ 1.40170791] ]

I found the following youtube series extremely helpful for understanding neural nets: Neural networks demystified

There is only little which I know and also that can be explained in this answer. If you want an even better understanding of neural nets, then I would suggest you to go through the following link: [cs231n: Modelling one neuron][4]

Upvotes: 2

Related Questions