Reputation: 51
I'm learning about neural networks, specifically looking at MLPs with a back-propagation implementation. I'm trying to implement my own network in python and I thought I'd look at some other libraries before I started. After some searching I found Neil Schemenauer's python implementation bpnn.py. (http://arctrix.com/nas/python/bpnn.py)
Having worked through the code and read the first part of Christopher M. Bishops book titled 'Neural Networks for Pattern Recognition' I found an issue in the backPropagate function:
# calculate error terms for output
output_deltas = [0.0] * self.no
for k in range(self.no):
error = targets[k]-self.ao[k]
output_deltas[k] = dsigmoid(self.ao[k]) * error
The line of code that calculates the error is different in Bishops book. On page 145, equation 4.41 he defines the output units error as:
d_k = y_k - t_k
Where y_k are the outputs and t_k are the targets. (I'm using _ to represent subscript) So my question is should this line of code:
error = targets[k]-self.ao[k]
Be infact:
error = self.ao[k] - targets[k]
I'm most likely completely wrong but could someone help clear up my confusion please. Thanks
Upvotes: 5
Views: 4799
Reputation: 129
In actual code, we often calculate NEGATIVE grad(of loss with regard to w), and use w += eta*grad to update weight. Actually its a grad ascent.
In some text book, POSITIVE grad is calculated and w -= eta*grad to update weight.
Upvotes: 0
Reputation: 3977
You can study this implementation of MLP from Padasip library.
And the documentation is here
Upvotes: 0
Reputation: 31597
It all depends on the error measure you use. To give just a few examples of error measures (for brevity, I'll use ys
to mean a vector of n
outputs and ts
to mean a vector of n
targets):
mean squared error (MSE):
sum((y - t) ** 2 for (y, t) in zip(ys, ts)) / n
mean absolute error (MAE):
sum(abs(y - t) for (y, t) in zip(ys, ts)) / n
mean logistic error (MLE):
sum(-log(y) * t - log(1 - y) * (1 - t) for (y, t) in zip(ys, ts)) / n
Which one you use depends entirely on the context. MSE and MAE can be used for when the target outputs can take any values, and MLE gives very good results when your target outputs are either 0
or 1
and when y
is in the open range (0, 1)
.
With that said, I haven't seen the errors y - t
or t - y
used before (I'm not very experienced in machine learning myself). As far as I can see, the source code you provided doesn't square the difference or use the absolute value, are you sure the book doesn't either? The way I see it y - t
or t - y
can't be very good error measures and here's why:
n = 2 # We only have two output neurons
ts = [ 0, 1 ] # Our target outputs
ys = [ 0.999, 0.001 ] # Our sigmoid outputs
# Notice that your outputs are the exact opposite of what you want them to be.
# Yet, if you use (y - t) or (t - y) to measure your error for each neuron and
# then sum up to get the total error of the network, you get 0.
t_minus_y = (0 - 0.999) + (1 - 0.001)
y_minus_t = (0.999 - 0) + (0.001 - 1)
Edit: Per alfa's comment, in the book, y - t
is actually the derivative of MSE. In that case, t - y
is incorrect. Note, however, that the actual derivative of MSE is 2 * (y - t) / n
, not simply y - t
.
If you don't divide by n
(so you actually have a summed squared error (SSE), not a mean squared error), then the derivative would be 2 * (y - t)
. Furthermore, if you use SSE / 2
as your error measure, then the 1 / 2
and the 2
in the derivative cancel out and you are left with y - t
.
Upvotes: 2
Reputation: 3098
You have to backpropagate the derivative of
0.5*(y-t)^2 or 0.5*(t-y)^2 with respect to y
which is always
y-t = (y-t)(+1) = (t-y)(-1)
Upvotes: 0