Check the gradients using finite differences

Question

I'm debugging my constrained stochastic gradient descent algorithm and the paper http://research.microsoft.com/pubs/192769/tricks-2012.pdf suggests to check the gradients using finite differences. I added a penalty function, but the model does not converge anymore, so i want to check my gradient as suggested in the paper.

Pick an example z.
Compute the loss Q(z, w) for the current w.
Compute the gradient g = ∇w Q(z, w).
Apply a slight perturbation w 0 = w +δ. For instance, change a single weight by a small increment, or use δ = −γg with γ small enough.
Compute the new loss Q(z, w0 ) and verify that Q(z, w0 ) ≈ Q(z, w) + δg

So I can pick an example and compute the loss of this example, but my weight vector contains of ~4000 features, so i get a vector of that many partial derivatives as my gradient while the loss is an Integer, so it's not possible to compute Q(z, w) + δg. Do i have to compute the loss for a single feature of w only? Is that meant by "the current w"?

Check the gradients using finite differences

Answers (1)

Related Questions