Aaron
Aaron

Reputation: 117

Neural Network ReLU Outputting All 0s

Here is a link to my project: https://github.com/aaronnoyes/neural-network/blob/master/nn.py

I have implemented a basic neural network in python. By default it uses a sigmoid activation function and that works great. I'm trying to compare changes in learning rate between activation functions, so I tried implementing an option for using ReLU. When it runs however, the weights all drop immediately to 0.

 if (self.activation == 'relu'):
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.relu(self.output, True)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))

I'm almost sure the issue is in lines 54-56 of my program (shown above) when I try to apply gradient descent. How can I fix this so the program will actually update weights appropriately? My relu implementation is as follows:

def relu(self, x, derivative=False):
    if derivative:
        return 1. * (x > 0)
    else:
        return x * (x > 0)

Upvotes: 0

Views: 811

Answers (1)

cheersmate
cheersmate

Reputation: 2666

There are two problems with your code:

  • You are applying a relu to the output layer as well. The recommended standard approach is to use identity as output layer activation for regression and sigmoid/softmax for classification.

  • You are using a learning rate of 1, which is way to high. (Usual test values are 1e-2 and smaller.)

I changed the output activation to sigmoid even when using relu activation in the hidden layers

def feedforward(self):
   ...

   if (self.activation == 'relu'):
        self.layer1 = self.relu(np.dot(self.input, self.weights1))
        self.output = self.sigmoid(np.dot(self.layer1, self.weights2))

    return self.output

def backprop(self):
    ...

    if (self.activation == 'relu'):
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.sigmoid(self.output, True)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))

and used a smaller learning rate

    # update the weights with the derivative (slope) of the loss function
    self.weights1 += .01 * d_weights1
    self.weights2 += .01 * d_weights2

and this is the result:

Actual Output : [[ 0.00000] [ 1.00000] [ 1.00000] [ 0.00000]]

Predicted Output: [[ 0.10815] [ 0.92762] [ 0.94149] [ 0.05783]]

Upvotes: 0

Related Questions