Reputation: 3587
I made a Neural Net and now I'm attempting to implement the back-propagation algorithm
I've used this diagram(pdf file) to help put down the math behind, since I'm no engineer, it may be wrong used, but I would like some insights.
The neural net is fixed in size(2 inputs, 2 hidden layers, 3 hidden nodes each, 2 output nodes), but I plan to change it later. I'm mainly concerned at the back propagation algorithm.
The problem is: the back-propagation doesn't seem to take effect on the net result, even tho the weights are changing each step of the algorithm.
import numpy as np
import math
class NeuralNetwork:
def __init__(self, learning_rate=0.0001):
self.learning_rate = learning_rate
self.weights_hidden_1 = np.arange(0.1, 0.7, 0.1).reshape((2, 3))
self.weights_hidden_2 = np.arange(0.7, 1.6, 0.1).reshape((3, 3))
self.weights_output = np.arange(1.6, 2.11, 0.1).reshape(3, 2)
self.input_values = None
self.results_hidden_1 = None
self.results_hidden_2 = None
self.results_output = None
@staticmethod
def activation(x):
"""Sigmoid function"""
try:
return 1 / (1 + math.e ** -x)
except OverflowError:
return 0
def delta_weights_output(self, expected_results):
errors = []
for k, result in enumerate(self.results_output):
error = result * (1 - result) * (result - expected_results[k])
errors.append(error)
errors = np.array(errors)
return errors
@staticmethod
def delta_weights_hidden(next_layer_results, next_layer_weights, next_layer_errors):
errors = []
for j, next_layer_result in enumerate(next_layer_results):
error_differences = []
for n, next_layer_error in enumerate(next_layer_errors):
error_difference = next_layer_weights[j][n] * next_layer_error
error_differences.append(error_difference)
error = next_layer_result * (1 - next_layer_result) * sum(error_differences)
errors.append(error)
return errors
def set_weight(self, weights, errors, results):
for j, result in enumerate(results):
for n, error in enumerate(errors):
new_weight = - self.learning_rate * error * result
weights[j][n] = new_weight
def back_propagate(self, expected_results):
output_error = self.delta_weights_output(expected_results)
self.set_weight(
self.weights_output,
output_error,
self.results_hidden_2
)
error_hidden_layer_2 = self.delta_weights_hidden(self.results_hidden_2,
self.weights_output,
output_error)
self.set_weight(
self.weights_hidden_2,
error_hidden_layer_2,
self.results_hidden_1
)
error_hidden_layer_1 = self.delta_weights_hidden(self.results_hidden_1,
self.weights_hidden_2,
error_hidden_layer_2)
self.set_weight(
self.weights_hidden_1,
error_hidden_layer_1,
self.input_values)
def feed_forward(self):
self.results_hidden_1 = np.array(
map(self.activation, self.input_values.dot(self.weights_hidden_1))
)
self.results_hidden_2 = np.array(
map(self.activation, self.results_hidden_1.dot(self.weights_hidden_2))
)
self.results_output = np.array(
map(self.activation, self.results_hidden_2.dot(self.weights_output))
)
def start_net(self, input_values):
self.input_values = np.array(input_values)
self.feed_forward()
return self.results_output
ANN = NeuralNetwork()
for n in xrange(10):
result = ANN.start_net([1, 2])
print result # should output [0.4, 0.6] after fixing the weights
ANN.back_propagate([0.4, 0.6])
EDIT1:
Following IVlad answer:
class NeuralNetwork:
def __init__(self, learning_rate=0.0001):
self.learning_rate = learning_rate
self.weights_hidden_1 = np.random.random((2,3))
self.weights_hidden_2 = np.random.random((3, 3))
self.weights_output = np.random.random((3, 2))
# ...
def start_net(self, input_values):
self.input_values = np.array(input_values)
self.input_values = (self.input_values - np.mean(self.input_values)) / np.std(self.input_values)
# ...
But still no changes. Even after 100000 rounds of learning. I'm getting [ 0.49999953 0.50000047]
Upvotes: 2
Views: 156
Reputation: 43477
There are many things that can go wrong.
First of all, you're not properly initializing your weights:
self.weights_hidden_1 = np.arange(0.1, 0.7, 0.1).reshape((2, 3))
self.weights_hidden_2 = np.arange(0.7, 1.6, 0.1).reshape((3, 3))
self.weights_output = np.arange(1.6, 2.11, 0.1).reshape(3, 2)
You should initialize the weights randomly, and they should be in [0, 1]
. The sigmoid function returns values very close to 1
for large values, so you will keep getting that due to your large weights. Its derivative will then be very small, which contributes to why you're seeing slow learning.
After that, you only seem to be doing ten rounds of learning? You should be doing a lot more, probably over 100, maybe even over 2000 with basic gradient descent.
Then, make sure you normalize your input data by subtracting the mean and dividing each feature by the standard deviation (but only if you have more than one training instance):
self.input_values = (self.input_values - np.mean(self.input_values, axis=0)) / np.std(self.input_values, axis=0)
I don't see a bug in the formulas, so I'm guessing it's probably the way you initialize the weights.
Also consider using the hyperbolic tangent activation function. It performs better in my experience. You can use it as np.tanh(x)
in numpy, and its derivative is 1 - result ** 2
.
Upvotes: 1