rihimetebmahk
rihimetebmahk

Reputation: 61

tf.gradients() vs tf.gradientTape.gradient() in graph mode

I had a question regarding the behavior of tf.gradients() as opposed tf.gradientTape.gradient() in graph mode.

Given a differentiable function y = f(x), where x and y are each single tensorflow tensors, is there any difference between the behavior of tf.gradient(y, x) vs tape.gradient(y, x) where tape is an instance of tf.gradientTape (assuming the use of graph mode) ?

Not sure why tensorflow has two different gradient methods which can be used with graph mode - maybe there are some subtle differences in the implementations? I’ve looked at the documentation for gradientTape and tf.gradients but it’s not clear whether there is any difference between the behavior of these methods for a single (x, y) pair, or whether it’s just that tf.gradients() can be used in this case for a speedup when using graph mode.

Thank you so much for your help!

Upvotes: 5

Views: 1871

Answers (1)

user11530462
user11530462

Reputation:

TF.gradients

tf.gradients is only valid in a graph context. In particular, it is valid in the context of a tf.function wrapper, where code is executing as a graph.

@tf.function
def example():
  a = tf.constant(0.)
  b = 2 * a
  return tf.gradients(a + b, [a, b], stop_gradients=[a, b])
example()

tf.GradientTape

TensorFlow provides the tf.GradientTape API for automatic differentiation. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation. tf.GradientTape not really required tf.function wrapper. It automatically runs in Graph mode.

x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)

Upvotes: 2

Related Questions