What is the difference between tf.gradients and tf.train.Optimizer.compute_gradient?

Question

It seems that tf.gradients allows to compute also Jacobians, i.e. the partial derivatives of each entry of one tensor wrt. each entry of another tensor, while tf.train.Optimizer.compute_gradient only computes actual gradients, e.g. the partial derivatives of a scalar value wrt. each entry of a particular tensor or wrt. one particular scalar. Why is there a separate function if tf.gradients also implements that functionality?

javidcf · Accepted Answer

tf.gradients does not allow you to compute Jacobians, it aggregates the gradients of each input for every output (something like the summation of each column of the actual Jacobian matrix). In fact, there is no "good" way of computing Jacobians in TensorFlow (basically you have to call tf.gradients once per output, see this issue).

With respect to tf.train.Optimizer.compute_gradients, yes, its result is basically the same, but taking care of some details automatically and with slightly more convenient output format. If you look at the implementation, you will see that, at its core, is a call to tf.gradients (in this case aliased to gradients.gradients), but it is useful for optimizer implementations to have the surrounding logic already implemented. Also, having it as a method allows for extensible behaviour in subclasses, either to implement some kind of optimization strategy (not very likely at the compute_gradients step, really) or for auxiliary purposes, like tracing or debugging.

What is the difference between tf.gradients and tf.train.Optimizer.compute_gradient?

Answers (1)

Related Questions