Reputation: 65
I'm trying to build an XOR neural network in python with one hidden layer but I'm hitting a problem with dimensions and I can't figure out why I'm getting the wrong dimensions in the first place because the math looks correct to me.
The dimensions issue starts in the backpropagation part and is commented. The error specifically is
File "nn.py", line 52, in <module>
d_a1_d_W1 = inp * deriv_sigmoid(z1)
File "/usr/local/lib/python3.7/site-packages/numpy/matrixlib/defmatrix.py", line 220, in __mul__
return N.dot(self, asmatrix(other))
ValueError: shapes (1,2) and (3,1) not aligned: 2 (dim 1) != 3 (dim 0)
Additionally, why does the sigmoid_derivative function here only work if I cast to a numpy array?
Code:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def deriv_sigmoid(x):
fx = np.array(sigmoid(x)) # gives dimensions issues unless I cast to array
return fx * (1 - fx)
hiddenNeurons = 3
outputNeurons = 1
inputNeurons = 2
X = np.array( [ [0, 1] ])
elem = np.matrix(X[0])
elem_row, elem_col = elem.shape
y = np.matrix([1])
W1 = np.random.rand(hiddenNeurons, elem_col)
b1 = np.random.rand(hiddenNeurons, 1)
W2 = np.random.rand(outputNeurons, hiddenNeurons)
b2 = np.random.rand(outputNeurons, 1)
lr = .01
for inp, ytrue in zip(X, y):
inp = np.matrix(inp)
# feedforward
z1 = W1 * inp.T + b1 # get weight matrix1 * inputs + bias1
a1 = sigmoid(z1) # get activation of hidden layer
z2 = W2 * a1 + b2 # get weight matrix2 * activated hidden + bias 2
a2 = sigmoid(z2) # get activated output
ypred = a2 # and call it ypred (y prediction)
# backprop
d_L_d_ypred = -2 * (ytrue - ypred) # derivative of mean squared error loss
d_ypred_d_W2 = a1 * deriv_sigmoid(z2) # deriviative of y prediction with respect to weight matrix 2
d_ypred_d_b2 = deriv_sigmoid(z2) # deriviative of y prediction with respect to bias 2
d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation
d_a1_d_W1 = inp * deriv_sigmoid(z1) # dimensions issue starts here ––––––––––––––––––––––––––––––––
d_a1_d_b1 = deriv_sigmoid(b1)
W1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_W1
b1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_b1
W2 -= lr * d_L_d_ypred * d_ypred_d_W2
b2 -= lr * d_L_d_ypred * d_ypred_d_b2
Upvotes: 2
Views: 182
Reputation: 3934
I've never tried to work with neural networks. So I don't fully understand what you are trying to do.
I'd guess there's some confusion as to how a * b
works if a & b are matrices, not numpy arrays. On numpy arrays * does an element wise multiplication, on np.matrices it does a matrix multiplication.
a=np.array([[1,2],[3,4]])
b = a-1
print(b)
# array([[0, 1],
# [2, 3]])
a*b # Element wise multiplication
# array([[ 0, 2], [[ 1*0, 2*1 ],
# [ 6, 12]]) [ 3*2, 4*3 ]]
am = np.matrix(a)
bm = np.matrix(b)
am * bm # Matrix (dot) multiplication
# matrix([[ 4, 7], [[ 0*1+1*2, 1*1+2*3],
# [ 8, 15]]) [ 1*2+2*3, 3*1+4*3]]
In the deriv_sigmoid function (without np.array) if x is matrix then fx is a matrix with the same shape (3,1). fx * (1-fx)
when fx is a (3,1) matrix raises an exception as two (3,1) matrices can't be multiplied together.
The same issue applies in the '# backprop' part of the code.
d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation
# W2 * deriv_sigmoid(z2) fails as shapes are incompatible with matrix multiplication.
# deriv_sigmoid(z2) * W2 would work, but I guess would return incorrect values (and shape).
d_a1_d_W1 = inp * deriv_sigmoid(z1)
# This fails for the same reason. The shapes of ing and z1 are incompatible.
Unless you need matrix multiplication I think using np.arrays will make the programming easier.
Upvotes: 1