FidesFacitFidem
FidesFacitFidem

Reputation: 51

Python Neural Network Backpropagation

I'm learning about neural networks, specifically looking at MLPs with a back-propagation implementation. I'm trying to implement my own network in python and I thought I'd look at some other libraries before I started. After some searching I found Neil Schemenauer's python implementation bpnn.py. (http://arctrix.com/nas/python/bpnn.py)

Having worked through the code and read the first part of Christopher M. Bishops book titled 'Neural Networks for Pattern Recognition' I found an issue in the backPropagate function:

# calculate error terms for output
output_deltas = [0.0] * self.no
for k in range(self.no):
    error = targets[k]-self.ao[k]
    output_deltas[k] = dsigmoid(self.ao[k]) * error

The line of code that calculates the error is different in Bishops book. On page 145, equation 4.41 he defines the output units error as:

d_k = y_k - t_k

Where y_k are the outputs and t_k are the targets. (I'm using _ to represent subscript) So my question is should this line of code:

error = targets[k]-self.ao[k]

Be infact:

error = self.ao[k] - targets[k]

I'm most likely completely wrong but could someone help clear up my confusion please. Thanks

Upvotes: 5

Views: 4799

Answers (4)

Travis
Travis

Reputation: 129

In actual code, we often calculate NEGATIVE grad(of loss with regard to w), and use w += eta*grad to update weight. Actually its a grad ascent.

In some text book, POSITIVE grad is calculated and w -= eta*grad to update weight.

Upvotes: 0

matousc
matousc

Reputation: 3977

You can study this implementation of MLP from Padasip library.

And the documentation is here

Upvotes: 0

Paul Manta
Paul Manta

Reputation: 31597

It all depends on the error measure you use. To give just a few examples of error measures (for brevity, I'll use ys to mean a vector of n outputs and ts to mean a vector of n targets):

mean squared error (MSE):
    sum((y - t) ** 2 for (y, t) in zip(ys, ts)) / n

mean absolute error (MAE):
    sum(abs(y - t) for (y, t) in zip(ys, ts)) / n

mean logistic error (MLE):
    sum(-log(y) * t - log(1 - y) * (1 - t) for (y, t) in zip(ys, ts)) / n 

Which one you use depends entirely on the context. MSE and MAE can be used for when the target outputs can take any values, and MLE gives very good results when your target outputs are either 0 or 1 and when y is in the open range (0, 1).

With that said, I haven't seen the errors y - t or t - y used before (I'm not very experienced in machine learning myself). As far as I can see, the source code you provided doesn't square the difference or use the absolute value, are you sure the book doesn't either? The way I see it y - t or t - y can't be very good error measures and here's why:

n = 2                 # We only have two output neurons
ts = [ 0, 1 ]         # Our target outputs
ys = [ 0.999, 0.001 ] # Our sigmoid outputs

# Notice that your outputs are the exact opposite of what you want them to be.
# Yet, if you use (y - t) or (t - y) to measure your error for each neuron and
# then sum up to get the total error of the network, you get 0.
t_minus_y = (0 - 0.999) + (1 - 0.001)
y_minus_t = (0.999 - 0) + (0.001 - 1)

Edit: Per alfa's comment, in the book, y - t is actually the derivative of MSE. In that case, t - y is incorrect. Note, however, that the actual derivative of MSE is 2 * (y - t) / n, not simply y - t.

If you don't divide by n (so you actually have a summed squared error (SSE), not a mean squared error), then the derivative would be 2 * (y - t). Furthermore, if you use SSE / 2 as your error measure, then the 1 / 2 and the 2 in the derivative cancel out and you are left with y - t.

Upvotes: 2

alfa
alfa

Reputation: 3098

You have to backpropagate the derivative of

0.5*(y-t)^2 or 0.5*(t-y)^2 with respect to y

which is always

y-t = (y-t)(+1) = (t-y)(-1)

Upvotes: 0

Related Questions