Julie
Julie

Reputation: 21

Compute gradient of the ouputs wrt the weights

Starting from a tensorflow model, I would like to be able to retrieve the gradient of the outputs with respect to the weights. Backpropagation aims to compute the gradient of the loss wrt the weights, in order to do that somewhere in the code the computation of the gradient of the ouputs wrt the weights has to happen.

But I am wondering how to get this Jacobian at the API level, any ideas ?

I know that we can have access to the tape but I am not sure what do to with that, actually I do not need the whole Jacobian I just need to be able to compute the matrix vector product of J^{*}v where J^{} is the transpose of the jacobian and v a given vector.

Thank you, Regards.

Upvotes: 2

Views: 958

Answers (1)

Nick McGreivy
Nick McGreivy

Reputation: 688

If you only need to compute the vector-Jacobian product, doing only that will be much more efficient than computing the full Jacobian. Computing the Jacobian of a function of N variables will cost O(N) time, as opposed to O(1) time for a vector-Jacobian product.

So how do you compute a vector-Jacobian product in TensorFlow? The trick is to use the output_gradients keyword arg in the gradient function. You set the value of output_gradients to the vector in the vector-Jacobian product. Let's look at an example.

import tensorflow as tf

with tf.GradientTape() as g:  
    x  = tf.constant([1.0, 2.0])  
    g.watch(x)  
    y = x*x # y is a length 2 vector

vec = tf.constant([2.0,3.0]) # vector in vector jacobian product

grad = g.gradient(y,x,output_gradients = vec)
print(grad) # prints the vector-jacobian product, [4.,12.]

Note: If you try to compute the gradient of a vector-valued (rather than scalar) function in tensorflow without setting output_gradients, it computes a vector-jacobian product where the vector is set to be all ones. For example,

import tensorflow as tf

with tf.GradientTape() as g:  
    x  = tf.constant([1.0, 2.0])  
    g.watch(x)  
    y = x*x # y is a length 2 vector

grad = g.gradient(y,x)
print(grad) # prints the vector-jacobian product with a vector of ones, [2.0,4.0]

Upvotes: 1

Related Questions