blue-sky
blue-sky

Reputation: 53886

Dimensions of 2 hidden layer neural network not correlating

Here I'm attempting to implement a 2 layer neural network using numpy alone. Code below is just computing forward propagation.

The training data is two examples where the inputs are 5 dimensions and outputs are 4 dimensions. When I attempt to run my network :

# Two Layer Neural network

import numpy as np

M = 2
learning_rate = 0.0001

X_train = np.asarray([[1,1,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,0,0,0] , [1,0,0,0]])

X_trainT = X_train.T
Y_trainT = Y_train.T

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))  
    return s

w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))

w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))

# forward propogation

dw1 =  ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 =  (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1

dw2 =  ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 =  (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2

Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
print(Y_prediction_train.T)

I receive error :

ValueError                                Traceback (most recent call last)
<ipython-input-42-f0462b5940a4> in <module>()
     36 b2 = b2 - learning_rate * db2
     37 
---> 38 Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
     39 print(Y_prediction_train.T)

ValueError: shapes (4,4) and (2,5) not aligned: 4 (dim 1) != 2 (dim 0)

I seem to have astray in my linear algebra but I'm not sure where.

Printing the weights and corresponding derivatives :

print(w1.shape)
print(w2.shape)
print(dw1.shape)
print(dw2.shape)

prints :

(4, 5)
(4, 4)
(4, 5)
(4, 4)

How to incorporate 5 dimensions of training examples into this network ?

Have I implemented forward propagation correctly ?

From @Imran answer now using this network :

# Two Layer Neural network

import numpy as np

M = 2
learning_rate = 0.0001

X_train = np.asarray([[1,0,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,1,0,0] , [1,0,0,0]])

X_trainT = X_train.T
Y_trainT = Y_train.T

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))  
    return s

w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))

w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))

# forward propogation

dw1 =  ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 =  (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1

dw2 =  ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 =  (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2

Y_prediction_train = sigmoid(np.dot(w2 , A1) +b2)
print(Y_prediction_train.T)

which prints :

[[ 0.5        0.5        0.4999875  0.4999875]
 [ 0.5        0.5        0.4999875  0.4999875]]

I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M) should instead be dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M) as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?

Upvotes: 2

Views: 427

Answers (1)

Imran
Imran

Reputation: 13468

Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)

w2 is the weight matrix for your second hidden layer. This should never be multiplied by your input, X_train.

To obtain a prediction you need to factor forward propagation into its own function that takes an input X, first computes A1 = sigmoid(np.dot(w1 , X)), and then returns the result of A2 = sigmoid(np.dot(w2 , A1))

UPDATE:

I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M) should instead be dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M) as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?

Backpropagation propagates errors backwards. The first step is the calculate the gradient of the loss function with respect to your outputs, which will be A2-Y if you are using Mean Squared Error. This will then be fed in to your terms for the gradients of the loss with respect to the weights and biases of layer 2, so on back to layer 1. You don't want to propagate anything from layer 1 to layer 2 during backprop.

It looks like you almost have it right in your updated question, but I think you want:

dW2 = ( 1 / M ) * np.dot((A2 - Y) , A1.T)

A couple more notes:

  1. You are intializing your weights as zeros. This will not allow the neural network to break symmetry during training, and you will end up with the same weights at every neuron. You should try initializing with random weights in the range [-1,1].
  2. You should put your forward- and backpropagation steps in a loop so you can run them for multiple epochs while your error is still improving.

Upvotes: 1

Related Questions