Reputation: 111
I am having trouble with implementing backprop while using the relu activation function. My model has two hidden layers with 10 nodes in both hidden layers and one node in the output layer (thus 3 weights, 3 biases). My model works other than for this broken broken backward_prop function. However, the function works with backprop using sigmoid activation function (included as comments in function). Thus, I believe I am screwing up the relu derivation.
Can anyone push me in the right direction?
# The derivative of relu function is 1 if z > 0, and 0 if z <= 0
def relu_deriv(z):
z[z > 0] = 1
z[z <= 0] = 0
return z
# Handles a single backward pass through the neural network
def backward_prop(X, y, c, p):
"""
cache (c): includes activations (A) and linear transformations (Z)
params (p): includes weights (W) and biases (b)
"""
m = X.shape[1] # Number of training ex
dZ3 = c['A3'] - y
dW3 = 1/m * np.dot(dZ3,c['A2'].T)
db3 = 1/m * np.sum(dZ3, keepdims=True, axis=1)
dZ2 = np.dot(p['W3'].T, dZ3) * relu_deriv(c['A2']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A2'], 2))
dW2 = 1/m * np.dot(dZ2,c['A1'].T)
db2 = 1/m * np.sum(dZ2, keepdims=True, axis=1)
dZ1 = np.dot(p['W2'].T,dZ2) * relu_deriv(c['A1']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A1'], 2))
dW1 = 1/m * np.dot(dZ1,X.T)
db1 = 1/m * np.sum(dZ1, keepdims=True, axis=1)
grads = {"dW1":dW1,"db1":db1,"dW2":dW2,"db2":db2,"dW3":dW3,"db3":db3}
return grads
Upvotes: 4
Views: 8557
Reputation: 464
Is your piece of code throwing an error or do you have a problem with the training? Can you make it clear?
Or in case you deal with binary classification, can you try to make only your output activation function sigmoid and the others ReLU?
Please state specifics.
Edit on reply:
Can you try this one?
def dReLU(x):
return 1. * (x > 0)
I refer to: https://gist.github.com/yusugomori/cf7bce19b8e16d57488a
Upvotes: 1