H Froedge
H Froedge

Reputation: 306

Neural Network seems to be getting stuck on a single output with each execution

I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.

When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!

import random
import numpy as np
import math
class Network(object):

def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
    self.layer1_activations = np.zeros((hiddenLayerSize, 1))
    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
    self.layer2_activations = np.zeros((outputLayerSize, 1))

    self.outputLayerSize = outputLayerSize
    self.inputLayerSize = inputLayerSize
    self.hiddenLayerSize = hiddenLayerSize

    # print(self.layer1)
    # print()
    # print(self.layer2)

    # self.weights = [np.random.randn(y,x)
    #                 for x, y in zip(sizes[:-1], sizes[1:])]

def feedforward(self, network_input):

    #Propogate forward through network as if doing this by hand.
    #first layer's output activations:
    for neuron in range(self.hiddenLayerSize):
        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))

    #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))


    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)

    return(outputs[np.argmax(self.layer2_activations)])

def train(self, training_pairs, epochs, minibatchsize, learn_rate):
    #apply gradient descent
    test_data = build_sinx_data(1000)
    for epoch in range(epochs):
        random.shuffle(training_pairs)
        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
        for minibatch in minibatches:
            loss = 0 #calculate loss for each minibatch

            #Begin training
            for x, y in minibatch:
                network_output = self.feedforward(x)
                loss += (network_output - y) ** 2
                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
            loss /= (2*len(minibatch))
            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
            self.layer1 += adjustWeights
            #print(adjustWeights)
            self.layer2 += adjustWeights
            #when line 63 placed here, results did not improve during minibatch.
        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
    print("Training Complete")

def evaluate(self, test_data):
    """
    Returns number of test inputs which network evaluates correctly.
    The ouput assumed to be neuron in output layer with highest activation
    :param test_data: test data set identical in form to train data set.
    :return: integer sum
    """
    correct = 0
    for x, y in test_data:
        output = self.feedforward(x)
        if output == y:
            correct+=1
    return(correct)

def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
    #parameter of randint signifies range of x values to be used*10
    x_vals.append(random.randint(-2000,2000)/10)
    y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate

sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))

Upvotes: 3

Views: 1694

Answers (2)

H Froedge
H Froedge

Reputation: 306

I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.

Some extremely helpful guides I used to fix things up:

Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.

Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d . Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.

How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1 . A useful and fast introduction to some neural networking basics.

Here is my (now functioning) code:

import random
import numpy as np
import scipy
import math
class Network(object):

    def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

        #Layers represented both by their weights array and activation and inputsums vectors.
        self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
        self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)

        self.layer1_activations = np.zeros((hiddenLayerSize, 1))
        self.layer2_activations = np.zeros((outputLayerSize, 1))

        self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
        self.layer2_inputsums = np.zeros((outputLayerSize, 1))

        self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
        self.layer2_errorsignals = np.zeros((outputLayerSize, 1))

        self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
        self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))

        self.outputLayerSize = outputLayerSize
        self.inputLayerSize = inputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        print()
        print(self.layer1)
        print()
        print(self.layer2)
        print()
        # self.weights = [np.random.randn(y,x)
        #                 for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, network_input):
        #Calculate inputsum and and activations for each neuron in the first layer
        for neuron in range(self.hiddenLayerSize):
            self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
            self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])

        # Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
        # weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
        self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
        for neuron in range(self.outputLayerSize):
            for weight in range(self.hiddenLayerSize):
                self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
            self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])

        return self.layer2_activations

    def interpreted_output(self, network_input):
        #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
        self.feedforward(network_input)
        outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
        return(outputs[np.argmax(self.layer2_activations)])

    # def build_expected_output(self, training_data):
    #     #Views expected output number y for each x to generate an expected output vector from the network
    #     index=0
    #     for pair in training_data:
    #         expected_output_vector = np.zeros((self.outputLayerSize,1))
    #         x = training_data[0]
    #         y = training_data[1]
    #         for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
    #             if y == i / 10:
    #                 expected_output_vector[i] = 1
    #                 #expect the target category to be a 1.
    #                 break
    #         training_data[index][1] = expected_output_vector
    #         index+=1
    #     return training_data

    def train(self, training_data, learn_rate):
        self.backpropagate(training_data, learn_rate)

    def backpropagate(self, train_data, learn_rate):
        #Perform for each x,y pair.
        for datapair in range(len(train_data)):
            x = train_data[datapair][0]
            y = train_data[datapair][1]
            self.feedforward(x)
           # print("l2a " + str(self.layer2_activations))
           # print("l1a " + str(self.layer1_activations))
           # print("l2 " + str(self.layer2))
           # print("l1 " + str(self.layer1))
            for neuron in range(self.outputLayerSize):
                #Calculate first error equation for error signals of output layer neurons
                self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])


            #Use recursive formula to calculate error signals of hidden layer neurons
            self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
            #print(self.layer1_errorsignals)
            # for neuron in range(self.hiddenLayerSize):
            #     #Use recursive formula to calculate error signals of hidden layer neurons
            #     self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])

            #Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
            #(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
            #Update all weights for network at each iteration of a training pair.

            #Update weights in second layer
            for neuron in range(self.outputLayerSize):
                for weight in range(self.hiddenLayerSize):
                    self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)

            self.layer2 += self.layer2_deltaw

            #Update weights in first layer
            for neuron in range(self.hiddenLayerSize):
                self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)

            self.layer1 += self.layer1_deltaw
            #Comment/Uncomment to enable error evaluation.
            #print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
            # print("l2a " + str(self.layer2_activations))
            # print("l1a " + str(self.layer1_activations))
            # print("l1 " + str(self.layer1))
            # print("l2 " + str(self.layer2))



    def evaluate(self, test_data):
        error = 0
        for x, y in test_data:
            #x is integer, y is single element np.array
            output = self.feedforward(x)
            error += y - output
        return error


#eval function for sin(x)
    # def evaluate(self, test_data):
    #     """
    #     Returns number of test inputs which network evaluates correctly.
    #     The ouput assumed to be neuron in output layer with highest activation
    #     :param test_data: test data set identical in form to train data set.
    #     :return: integer sum
    #     """
    #     correct = 0
    #     for x, y in test_data:
    #         outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
    #                                          1)]  # range(-10, 11, 1)
    #         newy = outputs[np.argmax(y)]
    #         output = self.interpreted_output(x)
    #         #print("output: " + str(output))
    #         if output == newy:
    #             correct+=1
    #     return(correct)

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def sigmoid_prime(self, z):
        return (1 - self.sigmoid(z)) * self.sigmoid(z)

def build_simple_data(data_points):
    x_vals = []
    y_vals = []
    for each in range(data_points):
        x = random.randint(-3,3)
        expected_output_vector = np.zeros((1, 1))
        if x > 0:
            expected_output_vector[[0]] = 1
        else:
            expected_output_vector[[0]] = 0

        x_vals.append(x)
        y_vals.append(expected_output_vector)
    print(list(zip(x_vals,y_vals)))
    print()
    return (list(zip(x_vals,y_vals)))


simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))

def test_network(iterations,net,training_points):
    """
    Casually evaluates pre and post test
    :param iterations: number of trials to be run
    :param net: name of network to evaluate.
    ;param training_points: size of training data to be used
    :return: four 1x1 arrays.
    """
    pretest_negative = 0
    pretest_positive = 0
    posttest_negative = 0
    posttest_positive = 0
    for each in range(iterations):
        pretest_negative += net.feedforward(-10)
        pretest_positive += net.feedforward(10)
    net.train(build_simple_data(training_points),.1)
    for each in range(iterations):
        posttest_negative += net.feedforward(-10)
        posttest_positive += net.feedforward(10)
    return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)

print(test_network(10000, simpleNet, 10000))

While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice

 #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))

The line

self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

Resembles

self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).

Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:

self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))

The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.

Upvotes: 0

Maxim
Maxim

Reputation: 53758

I didn't examine thoroughly all of your code, but some issues are clearly visible:

  • * operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.

  • Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.

  • You should definitely extract sigmoid function to avoid writing the same expression all over again:

    def sigmoid(z):
      return 1 / (1 + np.exp(-z))
    
    def sigmoid_derivative(x):
      return sigmoid(x) * (1 - sigmoid(x))
    
  • You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.

Upvotes: 1

Related Questions