How to compute the gradient of only one output unit?

Question

I have a trained model called net, last layer(output layer) is a Dense layer with 10 unit and linear activation funcion. when I calculate the gradient like this everything works fine:

   with tf.GradientTape(persistent=True) as tape:
        output = net(x)
   grad = tape.gradient(output, x)

output is a tf.Tensor with shape (1, 10).

now when I try to calculate gradients from only one of the 10 output units grad is None, and I calculate it like this for the first unit for example:

   with tf.GradientTape(persistent=True) as tape:
        output = net(x)
   grad = tape.gradient(output[0,0], x)

output[0,0] is a tf.Tensor.

What is the correct way to calculate this gradients?

xdurch0 · Accepted Answer

It's quite simple actually: You need to do everything, including indexing, inside the tape context. Meaning:

with tf.GradientTape(persistent=True) as tape:
     output = net(x)[0, 0]
grad = tape.gradient(output, x)

This should work as intended. Keep in mind that even something simple like indexing into a tensor is an "operation" that has a gradient defined and that needs to be backpropagated through. If you do it outside of the tape context, the tape basically "loses track" of the sequence of operations and cannot compute gradients anymore. By moving the indexing into the context, the problem is solved.

How to compute the gradient of only one output unit?

Answers (1)

Related Questions