f.rodrigues
f.rodrigues

Reputation: 3587

Neural Net Back Propagation not training

I made a Neural Net and now I'm attempting to implement the back-propagation algorithm

I've used this diagram(pdf file) to help put down the math behind, since I'm no engineer, it may be wrong used, but I would like some insights.

The neural net is fixed in size(2 inputs, 2 hidden layers, 3 hidden nodes each, 2 output nodes), but I plan to change it later. I'm mainly concerned at the back propagation algorithm.

The problem is: the back-propagation doesn't seem to take effect on the net result, even tho the weights are changing each step of the algorithm.

import numpy as np
import math

class NeuralNetwork:
    def __init__(self, learning_rate=0.0001):
        self.learning_rate = learning_rate

        self.weights_hidden_1 = np.arange(0.1, 0.7, 0.1).reshape((2, 3))
        self.weights_hidden_2 = np.arange(0.7, 1.6, 0.1).reshape((3, 3))
        self.weights_output = np.arange(1.6, 2.11, 0.1).reshape(3, 2)

        self.input_values = None
        self.results_hidden_1 = None
        self.results_hidden_2 = None
        self.results_output = None

    @staticmethod
    def activation(x):
        """Sigmoid function"""
        try:
            return 1 / (1 + math.e ** -x)
        except OverflowError:
            return 0

    def delta_weights_output(self, expected_results):
        errors = []
        for k, result in enumerate(self.results_output):
            error = result * (1 - result) * (result - expected_results[k])
            errors.append(error)
        errors = np.array(errors)

        return errors

    @staticmethod
    def delta_weights_hidden(next_layer_results, next_layer_weights, next_layer_errors):
        errors = []
        for j, next_layer_result in enumerate(next_layer_results):
            error_differences = []
            for n, next_layer_error in enumerate(next_layer_errors):
                error_difference = next_layer_weights[j][n] * next_layer_error
                error_differences.append(error_difference)
            error = next_layer_result * (1 - next_layer_result) * sum(error_differences)
            errors.append(error)

        return errors

    def set_weight(self, weights, errors, results):
        for j, result in enumerate(results):
            for n, error in enumerate(errors):
                new_weight = - self.learning_rate * error * result
                weights[j][n] = new_weight

    def back_propagate(self, expected_results):
        output_error = self.delta_weights_output(expected_results)

        self.set_weight(
            self.weights_output,
            output_error,
            self.results_hidden_2
        )

        error_hidden_layer_2 = self.delta_weights_hidden(self.results_hidden_2,
                                                         self.weights_output,
                                                         output_error)
        self.set_weight(
            self.weights_hidden_2,
            error_hidden_layer_2,
            self.results_hidden_1
        )

        error_hidden_layer_1 = self.delta_weights_hidden(self.results_hidden_1,
                                                         self.weights_hidden_2,
                                                         error_hidden_layer_2)
        self.set_weight(
            self.weights_hidden_1,
            error_hidden_layer_1,
            self.input_values)

    def feed_forward(self):
        self.results_hidden_1 = np.array(
            map(self.activation, self.input_values.dot(self.weights_hidden_1))
        )
        self.results_hidden_2 = np.array(
            map(self.activation, self.results_hidden_1.dot(self.weights_hidden_2))
        )
        self.results_output = np.array(
            map(self.activation, self.results_hidden_2.dot(self.weights_output))
        )

    def start_net(self, input_values):
        self.input_values = np.array(input_values)
        self.feed_forward()
        return self.results_output


ANN = NeuralNetwork()
for n in xrange(10):
    result = ANN.start_net([1, 2])
    print result # should output [0.4, 0.6] after fixing the weights
    ANN.back_propagate([0.4, 0.6])

EDIT1:

Following IVlad answer:

class NeuralNetwork:
    def __init__(self, learning_rate=0.0001):
        self.learning_rate = learning_rate

        self.weights_hidden_1 = np.random.random((2,3))
        self.weights_hidden_2 = np.random.random((3, 3))
        self.weights_output = np.random.random((3, 2))

    # ...

    def start_net(self, input_values):
        self.input_values = np.array(input_values)
        self.input_values = (self.input_values - np.mean(self.input_values)) / np.std(self.input_values)
        # ...

But still no changes. Even after 100000 rounds of learning. I'm getting [ 0.49999953 0.50000047]

Upvotes: 2

Views: 156

Answers (1)

IVlad
IVlad

Reputation: 43477

There are many things that can go wrong.

First of all, you're not properly initializing your weights:

self.weights_hidden_1 = np.arange(0.1, 0.7, 0.1).reshape((2, 3))
self.weights_hidden_2 = np.arange(0.7, 1.6, 0.1).reshape((3, 3))
self.weights_output = np.arange(1.6, 2.11, 0.1).reshape(3, 2)

You should initialize the weights randomly, and they should be in [0, 1]. The sigmoid function returns values very close to 1 for large values, so you will keep getting that due to your large weights. Its derivative will then be very small, which contributes to why you're seeing slow learning.

After that, you only seem to be doing ten rounds of learning? You should be doing a lot more, probably over 100, maybe even over 2000 with basic gradient descent.

Then, make sure you normalize your input data by subtracting the mean and dividing each feature by the standard deviation (but only if you have more than one training instance):

self.input_values = (self.input_values - np.mean(self.input_values, axis=0)) / np.std(self.input_values, axis=0)

I don't see a bug in the formulas, so I'm guessing it's probably the way you initialize the weights.

Also consider using the hyperbolic tangent activation function. It performs better in my experience. You can use it as np.tanh(x) in numpy, and its derivative is 1 - result ** 2.

Upvotes: 1

Related Questions