n_1
n_1

Reputation: 65

Wrong dimensions XOR neural network python

I'm trying to build an XOR neural network in python with one hidden layer but I'm hitting a problem with dimensions and I can't figure out why I'm getting the wrong dimensions in the first place because the math looks correct to me.

The dimensions issue starts in the backpropagation part and is commented. The error specifically is

  File "nn.py", line 52, in <module>
    d_a1_d_W1 = inp * deriv_sigmoid(z1) 
  File "/usr/local/lib/python3.7/site-packages/numpy/matrixlib/defmatrix.py", line 220, in __mul__
    return N.dot(self, asmatrix(other))
ValueError: shapes (1,2) and (3,1) not aligned: 2 (dim 1) != 3 (dim 0)

Additionally, why does the sigmoid_derivative function here only work if I cast to a numpy array?

Code:


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):

  fx = np.array(sigmoid(x)) # gives dimensions issues unless I cast to array
  return fx * (1 - fx)

hiddenNeurons = 3
outputNeurons = 1
inputNeurons = 2

X = np.array( [ [0, 1]  ])
elem = np.matrix(X[0])
elem_row, elem_col = elem.shape


y = np.matrix([1])

W1 = np.random.rand(hiddenNeurons, elem_col)
b1 = np.random.rand(hiddenNeurons, 1)
W2 = np.random.rand(outputNeurons, hiddenNeurons)
b2 = np.random.rand(outputNeurons, 1)
lr = .01



for inp, ytrue in zip(X, y):
    inp = np.matrix(inp)

    # feedforward
    z1 = W1 * inp.T + b1 # get weight matrix1 * inputs + bias1
    a1 = sigmoid(z1) # get activation of hidden layer

    z2 = W2 * a1 + b2 # get weight matrix2 * activated hidden + bias 2
    a2 = sigmoid(z2) # get activated output 
    ypred = a2 # and call it ypred (y prediction)

    # backprop
    d_L_d_ypred = -2 * (ytrue - ypred) # derivative of mean squared error loss

    d_ypred_d_W2 = a1 * deriv_sigmoid(z2) # deriviative of y prediction with respect to weight matrix 2
    d_ypred_d_b2 = deriv_sigmoid(z2) # deriviative of y prediction with respect to bias 2

    d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation

    d_a1_d_W1 = inp * deriv_sigmoid(z1) # dimensions issue starts here ––––––––––––––––––––––––––––––––

    d_a1_d_b1 = deriv_sigmoid(b1) 

    W1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_W1
    b1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_b1
    W2 -= lr * d_L_d_ypred * d_ypred_d_W2
    b2 -= lr * d_L_d_ypred * d_ypred_d_b2


Upvotes: 2

Views: 182

Answers (1)

Tls Chris
Tls Chris

Reputation: 3934

I've never tried to work with neural networks. So I don't fully understand what you are trying to do.

I'd guess there's some confusion as to how a * b works if a & b are matrices, not numpy arrays. On numpy arrays * does an element wise multiplication, on np.matrices it does a matrix multiplication.

a=np.array([[1,2],[3,4]])
b = a-1
print(b) 
# array([[0, 1],
#        [2, 3]])

a*b     # Element wise multiplication
# array([[ 0,  2],     [[ 1*0, 2*1 ], 
#        [ 6, 12]])     [ 3*2, 4*3 ]]

am = np.matrix(a)
bm = np.matrix(b)

am * bm  # Matrix (dot) multiplication
# matrix([[ 4,  7],    [[ 0*1+1*2, 1*1+2*3],
#         [ 8, 15]])    [ 1*2+2*3, 3*1+4*3]]

In the deriv_sigmoid function (without np.array) if x is matrix then fx is a matrix with the same shape (3,1). fx * (1-fx) when fx is a (3,1) matrix raises an exception as two (3,1) matrices can't be multiplied together.

The same issue applies in the '# backprop' part of the code.

d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation
# W2 * deriv_sigmoid(z2) fails as shapes are incompatible with matrix multiplication.    
# deriv_sigmoid(z2) * W2 would work, but I guess would return incorrect values (and shape).

d_a1_d_W1 = inp * deriv_sigmoid(z1)
# This fails for the same reason.  The shapes of ing and z1 are incompatible.

Unless you need matrix multiplication I think using np.arrays will make the programming easier.

Upvotes: 1

Related Questions