TobSta
TobSta

Reputation: 786

Check the gradients using finite differences

I'm debugging my constrained stochastic gradient descent algorithm and the paper http://research.microsoft.com/pubs/192769/tricks-2012.pdf suggests to check the gradients using finite differences. I added a penalty function, but the model does not converge anymore, so i want to check my gradient as suggested in the paper.

  1. Pick an example z.
  2. Compute the loss Q(z, w) for the current w.
  3. Compute the gradient g = ∇w Q(z, w).
  4. Apply a slight perturbation w 0 = w +δ. For instance, change a single weight by a small increment, or use δ = −γg with γ small enough.
  5. Compute the new loss Q(z, w0 ) and verify that Q(z, w0 ) ≈ Q(z, w) + δg

So I can pick an example and compute the loss of this example, but my weight vector contains of ~4000 features, so i get a vector of that many partial derivatives as my gradient while the loss is an Integer, so it's not possible to compute Q(z, w) + δg. Do i have to compute the loss for a single feature of w only? Is that meant by "the current w"?

Upvotes: 1

Views: 1915

Answers (1)

lejlot
lejlot

Reputation: 66815

The equation in the publication looks weird as it is not carefully described. In order to check the gradient you usually check if the difference between your "guessed" gradient g and numerical gradient, which ith dimension equals

( Q(z, w + delta*e_i) - Q(z, w) ) / ( delta )

for small enough delta, and e_i being ith canonical vector (with 1 on ith dimension and 0 otherwise) is small enough. In other words if we denote by g_i the ith dimension of your gradient then you need to check if

| ( Q(z, w + delta*e_i) - Q(z, w) ) / ( delta ) - g_i | < eps

| Q(z, w + delta*e_i) - Q(z, w) - delta * g_i | < delta*eps

which boils down to checking

| Q(z, w + delta*e_i) - ( Q(z, w) + delta * g_i ) | < delta*eps

thus check if

Q(z, w + delta*e_i) ≈ ( Q(z, w) + delta * g_i )

which is their equation, simply feature-wise.

Upvotes: 0

Related Questions