unleashed
unleashed

Reputation: 925

Neural Networks - Back Propagation

Just one quick question and some clarification I need regarding Neural Networks and Back Propagation for training instances.

If anyone could base their example on something similar to this, it would be great since I am lacking simple to understand examples.

Let's say there are three colours needed to train which are red, blue and green where we represent red as this below using normalization since they are nominal values.

red = 0.4
blue = 0.7
green = 1.0

There are 3 input layers, 2 hidden and 1 output.

I assume that random weights between -1 and 1 are provided and are multiplied against each input layer node feeding it through the layer and giving a network output value of 0.562 which is stored alongside the instance. Would this output value be stored alongside all three instances? How does the training occur such that the error is calculated then back propagated? This is what is really confusing me.

Since I need to code this algorithm, it would be great to get a better understanding first.

Upvotes: 2

Views: 2428

Answers (1)

Andrew
Andrew

Reputation: 617

While I don't exactly understand your example, the question of backpropagation is fairly common. In the simplest case with strictly layered feed-forward and one output node:

First you need to propagate the information forwards. It looks like you may have this already, however make sure you keep track of what the value at each node was after the squashing function, lets call this o, and keep one for each node.

Once the forward propagation is done, for backpropagation you need to calculate the error. This is the difference between what was expected and what was given. In addition multiply this by the derivative in order to give a direction for the update later (the derivation of the derivative is complicated, but the use is very simple).

Error[output] = (Expected - Actual) * o(1 - o)

Then propagate the error at each node backwards through the network. This gives an estimate on the 'responsibility' of each node for the error. So the error at each node is the error at all nodes in the next layer weighted by the weights on each link. Again, we multiply by the derivative so we have direction.

Error[hidden] = Sum (Error[output]*weight[hiddenToOutput]) * o(1 - o)

Repeat this for every layer of links (input to hidden, hidden to hidden, hidden to output) as necessary.

Finally, the training occurs by updating the weights on the links. For this we combine all the information we have to get the final update.

Weight[hiddenToOutput] = weight[hiddenToOutput] + learningRate * error[output] * input

Where input is the value that went into the link (that is, 'o' from the previous layer, and error is from the following layer), and learningRate is some small number (eg. 0.01) to limit the size of our updates. Analogous calculation is done for the weight[inputToHidden] etc, layers.

((NB: this assumes the sigmoid squashing function))

Hope this helps. Additional info can be found in lots of places. I learned from Machine Learning by Tom M. Mitchell. It has a good pseudocode section.

Upvotes: 10

Related Questions