vsnyc
vsnyc

Reputation: 2257

Error of output neuron

In a neural network with backpropagation, after we are done with the forward pass, the next step is to calculate error of the output neuron. The figure below shows error of the output neuron is δ = z - y. The full text for backpropagation can be found here. I get this part.

MLP Neural Net with backpropagation

If the activation function for neurons is the sigmoid function, I have read in another article that the error should not be computed as a simple difference, rather it would be δ = y*(1-y)*(z - y)

Could someone familiar with this explain the rationale behind this? Why does having sigmoid activation function result in error computation to become: δ = y*(1-y)*(z - y) and no longer be: δ = (z - y)?

The only similar question I found for this question was this, however the asker had not asked why the error is computed like this.

Upvotes: 1

Views: 1571

Answers (2)

2PacIsAlive
2PacIsAlive

Reputation: 85

Using δ = (z - y) as as an error function assumes that the expected output is 1 or 0 (the unit should either be maximally activated or not at all). This error function is used for output units. Hidden layer units, however, are not meant to be maximally/minimally activated - their activations are supposed to be defined very precisely. Thus, the error function must propagate the error using the derivative of the sigmoid function, yielding the final formula δ = y*(1-y)*(z - y).

Upvotes: 0

Ibraim Ganiev
Ibraim Ganiev

Reputation: 9390

Forget about all these trendy names as back propagation, it's nothing more than simple taks of mathematical optimization. One of the possible ways to optimize cost function - use gradient descend iterative algorithm, to use it you should know derivative of a target function. I.e you should know how to variate your parameter to minimize some function. Fortunately deriviative in some sense shows how your function would change if you would change some parameter.

In your case when you have two different optimization tasks.

First target function is enter image description here index i denotes particular sample from dataset.

enter image description here

But if you'll add sigmoid function to your hypothesis enter image description here

You should compute your deriviative according to Chain_rule because sigmoid function is nonlinear.

enter image description here

enter image description here

enter image description here

So:

enter image description here

Upvotes: 4

Related Questions