Reputation: 117
Here is a link to my project: https://github.com/aaronnoyes/neural-network/blob/master/nn.py
I have implemented a basic neural network in python. By default it uses a sigmoid activation function and that works great. I'm trying to compare changes in learning rate between activation functions, so I tried implementing an option for using ReLU. When it runs however, the weights all drop immediately to 0.
if (self.activation == 'relu'):
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.relu(self.output, True)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))
I'm almost sure the issue is in lines 54-56 of my program (shown above) when I try to apply gradient descent. How can I fix this so the program will actually update weights appropriately? My relu implementation is as follows:
def relu(self, x, derivative=False):
if derivative:
return 1. * (x > 0)
else:
return x * (x > 0)
Upvotes: 0
Views: 811
Reputation: 2666
There are two problems with your code:
You are applying a relu to the output layer as well. The recommended standard approach is to use identity as output layer activation for regression and sigmoid/softmax for classification.
You are using a learning rate of 1, which is way to high. (Usual test values are 1e-2 and smaller.)
I changed the output activation to sigmoid even when using relu activation in the hidden layers
def feedforward(self):
...
if (self.activation == 'relu'):
self.layer1 = self.relu(np.dot(self.input, self.weights1))
self.output = self.sigmoid(np.dot(self.layer1, self.weights2))
return self.output
def backprop(self):
...
if (self.activation == 'relu'):
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.sigmoid(self.output, True)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))
and used a smaller learning rate
# update the weights with the derivative (slope) of the loss function
self.weights1 += .01 * d_weights1
self.weights2 += .01 * d_weights2
and this is the result:
Actual Output : [[ 0.00000] [ 1.00000] [ 1.00000] [ 0.00000]]
Predicted Output: [[ 0.10815] [ 0.92762] [ 0.94149] [ 0.05783]]
Upvotes: 0