Reputation: 1689
I tried to understand supervised learning by torch tutorial.
http://code.madbits.com/wiki/doku.php?id=tutorial_supervised
And backpropagation :
http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
As I know, parameter update in this torch tutorial is in Step 4 Training Procedure,
output = model:forward(inputs[i])
df_do = criterion:backward(output, targets[i])
model:backward(inputs[i], df_do)
For example, I got this
output = -2.2799
-2.3638
-2.3183
-2.1955
-2.3377
-2.3434
-2.3740
-2.2641
-2.3449
-2.2214
[torch.DoubleTensor of size 10]
targets[i] = 9
df_do is this ?
0
0
0
0
0
0
0
0
-1
0
[torch.DoubleTensor of size 10]
I know the target is 9 and the output is 4 in this example, so the result is wrong and give the 9-th element of df_do "-1".
But why ?
According to http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html,
df_do is [ target (desired output) - output ].
Upvotes: 1
Views: 2038
Reputation: 2160
In Torch backprop works exactly as it does in mathematics. df_do
is a derivative of loss w.r.t. prediction, and therefore entirely defined by your loss function, i.e. nn.Criterion
.
The most famous one is Mean Square Error (nn.MSECriterion
):
Note that MSE criterion expects target to have the same size as prediction (a one-hot vector for classification). If you choose MSE, your derivative vector df_do
will be computed as:
The MSE criterion, however, is typically not very good for classification. The more suitable one is Likelihood criterion, which takes a probability vector as prediction and a scalar index of the true class as target. The aim is to simply maximize probability of the true class, that equals to minimization of its negative:
If we give it log-probability vector qua prediction (it is a monotone transformation and thus doesn't affect the optimization result but more computationally stable), we'll get the Negative Log Likelihood loss function (nn.ClassNLLCriterion
):
In that case, df_do
is as follows:
In the torch tutorial NLL criterion is used by default.
Upvotes: 4