Reputation: 61
I had a question regarding the behavior of tf.gradients() as opposed tf.gradientTape.gradient() in graph mode.
Given a differentiable function y = f(x), where x and y are each single tensorflow tensors, is there any difference between the behavior of tf.gradient(y, x) vs tape.gradient(y, x) where tape is an instance of tf.gradientTape (assuming the use of graph mode) ?
Not sure why tensorflow has two different gradient methods which can be used with graph mode - maybe there are some subtle differences in the implementations? I’ve looked at the documentation for gradientTape and tf.gradients but it’s not clear whether there is any difference between the behavior of these methods for a single (x, y) pair, or whether it’s just that tf.gradients() can be used in this case for a speedup when using graph mode.
Thank you so much for your help!
Upvotes: 5
Views: 1871
Reputation:
TF.gradients
tf.gradients
is only valid in a graph context. In particular, it is valid in the context of a tf.function
wrapper, where code is executing as a graph.
@tf.function
def example():
a = tf.constant(0.)
b = 2 * a
return tf.gradients(a + b, [a, b], stop_gradients=[a, b])
example()
tf.GradientTape
TensorFlow provides the tf.GradientTape
API for automatic differentiation. TensorFlow "records"
relevant operations executed inside the context of a tf.GradientTape
onto a "tape"
. TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation. tf.GradientTape
not really required tf.function
wrapper. It automatically runs in Graph mode.
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
Upvotes: 2