yo bar
yo bar

Reputation: 23

relu function neural network outputting 0 or 1

I tried implementing a simple neural network using both sigmoid and relu functions . with the sigmoid function I got some good outputs . but when using relu I got either 0's or 1's array. (I need the relu function beacause I'm willing to use the code for some outputs>1).

def relu(x):
return np.maximum(0,x)

def reluDerivative(x):
  x[x<=0] = 0
  x[x>0] = 1
  return x
training_inputs = np.array([[9, 0 , 1],
[7, 1, 1],
[8, 0, 1],
[5, 1, 1]
])

training_outputs = np.array([[9, 7, 8, 5]]).T

np.random.seed(1)

synaptic_weights = 2 * np.random.random((3,1)) - 1


for iteration in range(100000):

   outputs = relu(np.dot(training_inputs, synaptic_weights))


   error = training_outputs - outputs
   adjustments = error * reluDerivative(outputs)
   synaptic_weights += np.dot(training_inputs.T, adjustments )

print("output after training: \n" , outputs)

Upvotes: 0

Views: 1477

Answers (1)

enerve
enerve

Reputation: 63

Update:

(Thanks for including the relu and reluDerivative methods)

The error is indeed in reluDerivative(x) method.

When you do x[x<=0] = 0 you are modifying the given numpy array. The argument x is not a clone / deep copy of outputs, it is exactly the same numpy array. So when you modify x, you also modify outputs.

I hope you can figure out why this causes the bug - but let me know if you would like a further explanation.

Update 2

It looks like the code has more issues than the one above, and these are a bit trickier:

  • If you step through the code using a debugger, you'll notice that, unfortunately with the current random seed (1), the synaptic weights are initialized such that all your training examples produce a negative dot product, which the ReLU then sets to zero. The gradient of zero is zero and this is one of the risks of using ReLU. How to mitigate this?

    • Well, you could use other seeds (e.g. seed=10) but this is not a satisfying solution
    • This issue would be much less likely if you had a much larger training set (e.g. 100 instead of just 4) because it would be unlikely that all 100 result in negative dot-products.
    • I notice that the first item in every data row is much larger than the rest. Performing "normalization" on the data set would've avoided this problem. You can read up more on how to normalize the input.
    • Finally, this "zero gradient" problem with ReLUs is precisely why "LeakyReLU" was invented. In larger neural nets, regular ReLUs may be sufficient in parctice, but in your simplistic example, LeaklyReLU would've surely avoided the issue.
  • Once you solve these above problems, you will still notice another problem. The errors and gradients will blow up within a few iterations. This is because you're not yet using a "learning rate" parameter to constrain the rate at which the weights are updated. Read up on how to use a learning rate (or alpha) parameter.

Good luck!

Upvotes: 1

Related Questions