Reputation: 820
I have a trained model called net, last layer(output layer) is a Dense layer with 10 unit and linear activation funcion. when I calculate the gradient like this everything works fine:
with tf.GradientTape(persistent=True) as tape:
output = net(x)
grad = tape.gradient(output, x)
output is a tf.Tensor with shape (1, 10).
now when I try to calculate gradients from only one of the 10 output units grad is None, and I calculate it like this for the first unit for example:
with tf.GradientTape(persistent=True) as tape:
output = net(x)
grad = tape.gradient(output[0,0], x)
output[0,0] is a tf.Tensor.
What is the correct way to calculate this gradients?
Upvotes: 1
Views: 211
Reputation: 10474
It's quite simple actually: You need to do everything, including indexing, inside the tape context. Meaning:
with tf.GradientTape(persistent=True) as tape:
output = net(x)[0, 0]
grad = tape.gradient(output, x)
This should work as intended. Keep in mind that even something simple like indexing into a tensor is an "operation" that has a gradient defined and that needs to be backpropagated through. If you do it outside of the tape context, the tape basically "loses track" of the sequence of operations and cannot compute gradients anymore. By moving the indexing into the context, the problem is solved.
Upvotes: 2