Reputation: 21
Starting from a tensorflow model, I would like to be able to retrieve the gradient of the outputs with respect to the weights. Backpropagation aims to compute the gradient of the loss wrt the weights, in order to do that somewhere in the code the computation of the gradient of the ouputs wrt the weights has to happen.
But I am wondering how to get this Jacobian at the API level, any ideas ?
I know that we can have access to the tape but I am not sure what do to with that, actually I do not need the whole Jacobian I just need to be able to compute the matrix vector product of J^{*}v where J^{} is the transpose of the jacobian and v a given vector.
Thank you, Regards.
Upvotes: 2
Views: 958
Reputation: 688
If you only need to compute the vector-Jacobian product, doing only that will be much more efficient than computing the full Jacobian. Computing the Jacobian of a function of N variables will cost O(N) time, as opposed to O(1) time for a vector-Jacobian product.
So how do you compute a vector-Jacobian product in TensorFlow? The trick is to use the output_gradients
keyword arg in the gradient
function. You set the value of output_gradients
to the vector in the vector-Jacobian product. Let's look at an example.
import tensorflow as tf
with tf.GradientTape() as g:
x = tf.constant([1.0, 2.0])
g.watch(x)
y = x*x # y is a length 2 vector
vec = tf.constant([2.0,3.0]) # vector in vector jacobian product
grad = g.gradient(y,x,output_gradients = vec)
print(grad) # prints the vector-jacobian product, [4.,12.]
Note: If you try to compute the gradient of a vector-valued (rather than scalar) function in tensorflow without setting output_gradients
, it computes a vector-jacobian product where the vector is set to be all ones. For example,
import tensorflow as tf
with tf.GradientTape() as g:
x = tf.constant([1.0, 2.0])
g.watch(x)
y = x*x # y is a length 2 vector
grad = g.gradient(y,x)
print(grad) # prints the vector-jacobian product with a vector of ones, [2.0,4.0]
Upvotes: 1