Reputation: 53886
Here I'm attempting to implement a 2 layer neural network using numpy alone. Code below is just computing forward propagation.
The training data is two examples where the inputs are 5 dimensions and outputs are 4 dimensions. When I attempt to run my network :
# Two Layer Neural network
import numpy as np
M = 2
learning_rate = 0.0001
X_train = np.asarray([[1,1,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,0,0,0] , [1,0,0,0]])
X_trainT = X_train.T
Y_trainT = Y_train.T
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))
w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))
# forward propogation
dw1 = ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 = (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1
dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 = (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2
Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
print(Y_prediction_train.T)
I receive error :
ValueError Traceback (most recent call last)
<ipython-input-42-f0462b5940a4> in <module>()
36 b2 = b2 - learning_rate * db2
37
---> 38 Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
39 print(Y_prediction_train.T)
ValueError: shapes (4,4) and (2,5) not aligned: 4 (dim 1) != 2 (dim 0)
I seem to have astray in my linear algebra but I'm not sure where.
Printing the weights and corresponding derivatives :
print(w1.shape)
print(w2.shape)
print(dw1.shape)
print(dw2.shape)
prints :
(4, 5)
(4, 4)
(4, 5)
(4, 4)
How to incorporate 5 dimensions of training examples into this network ?
Have I implemented forward propagation correctly ?
From @Imran answer now using this network :
# Two Layer Neural network
import numpy as np
M = 2
learning_rate = 0.0001
X_train = np.asarray([[1,0,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,1,0,0] , [1,0,0,0]])
X_trainT = X_train.T
Y_trainT = Y_train.T
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))
w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))
# forward propogation
dw1 = ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 = (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1
dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 = (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2
Y_prediction_train = sigmoid(np.dot(w2 , A1) +b2)
print(Y_prediction_train.T)
which prints :
[[ 0.5 0.5 0.4999875 0.4999875]
[ 0.5 0.5 0.4999875 0.4999875]]
I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
should instead be
dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M)
as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?
Upvotes: 2
Views: 427
Reputation: 13468
Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
w2
is the weight matrix for your second hidden layer. This should never be multiplied by your input, X_train
.
To obtain a prediction you need to factor forward propagation into its own function that takes an input X
, first computes A1 = sigmoid(np.dot(w1 , X))
, and then returns the result of A2 = sigmoid(np.dot(w2 , A1))
UPDATE:
I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M) should instead be dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M) as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?
Backpropagation propagates errors backwards. The first step is the calculate the gradient of the loss function with respect to your outputs, which will be A2-Y
if you are using Mean Squared Error. This will then be fed in to your terms for the gradients of the loss with respect to the weights and biases of layer 2, so on back to layer 1. You don't want to propagate anything from layer 1 to layer 2 during backprop.
It looks like you almost have it right in your updated question, but I think you want:
dW2 = ( 1 / M ) * np.dot((A2 - Y) , A1.T)
A couple more notes:
Upvotes: 1