Reputation: 1289
I'm attempting to learn about neural networks using matrices and in doing so I decided to set myself a challenge, train a simple neural network to output 0.5*sigma(x).
My guess is this would be sufficient as W needs to be 0.5.
But I've run into a problem regarding the loss function and the weights.
import sys
import math
import numpy as np
import random
@np.vectorize
def dlossbydw(X,W,Y):
t = np.dot(X,W)
Yhat = sig(t)
n = Y-Yhat
l = 2*n * -1 * sig(t)*(1-sig(t))*X
return l
@np.vectorize
def sig(Z):
return (1.0/ (1.0 + math.exp(-Z)))
@np.vectorize
def toMatch(Z):
return 0.5*sig(Z)
def main(args):
#This is the matrix that should be output
Y = np.array([[toMatch(x)] for x in range(-5,5)])
#This is the input matix
X = np.array([[x] for x in range(-5,5)])
random.seed(5)
r = random.random()
#And this is the weight matrix
W = np.array([r])
rate = 1e-1
for i in range(1000):
print("dlossbydw: " + str(dlossbydw(X,W,Y)))
#???
print("expected out:" + str(Y))
print("post training:" + str(sig(np.dot(X,W))))
if(__name__=="__main__"):
main(sys.argv[1:])
The dimensions of the weight matrix W is 1x1, (there is only one weight connecting the input neuron and the output) however when I do the math to calculate the loss with respect to w, it gives me a 10x1 matrix, my guess is it gives me the loss for each X inputted for the input matrix, but what do I do with this? Each value in the loss is different from the last, which is strange considering they all require the same shift in W. (towards 0.5)
I must be misunderstanding something, or confusing myself over a simple mistake.
Can someone please clarify what I'm doing wrong here?
What should I be doing at the place marked?
Is this correct?
Thank you.
Upvotes: 2
Views: 431
Reputation: 1659
If you have a NN with one weight, one input, no bias and a sigmoid activation function the output is calculated like this:
y = sig(w*x)
If you're trying to match 0.5*sig(x)
the weight w
is not necessarily 0.5.
I played a little around with your code:
Make sure that your weights have the right shape. It currently has (1,)
. I fixed that like:
random.seed(5)
W = np.array([random.random()])
W.shape=(1,1)
Your math is otherwise correct. The problem is, that you have to distinguish between component-wise multiplication and the dot product using numpy
like this:
l = np.dot(X.T, 2*n * -1 * sig(t)*(1-sig(t)))
And get rid of np.vectorize
for dlossbydw
because it shouldn't be applied to individual elements of the vector since you need the dot product in the end to get one scalar value.
l
is now a 1x1 matrix, which gives you the gradient for your weight to adapt.
for i in range(1000):
l = dlossbydw(X,W,Y)
print("dlossbydw: " + str(l))
W = W-l
Upvotes: 2