Reputation: 345
I would like to calculate the gradients of the output of a neural network with respect to the input. I have the following tensors:
Input: (num_timesteps, features)
Output: (num_timesteps, 1)
For the gradients from the inputs to the entire output vector I can use the following:
tf.gradients(Output, Input)
Since I would like to compute the gradients for every single timesample I would like to calculate
tf.gradients(Output[i], Input)
for every i
.
What is the best way to do that?
Upvotes: 1
Views: 1643
Reputation: 53758
First up, I suppose you mean the gradient of Output
with respect to the Input
.
Now, the result of both of these calls:
dO = tf.gradients(Output, Input)
dO_i = tf.gradients(Output[i], Input)
(for any valid i
)will be a list with a single element - a tensor with the same shape as Input
, namely a [num_timesteps, features]
matrix. Also, if you sum all matrices dO_i
(over all valid i
) is exactly the matrix dO
.
With this in mind, back to your question. In many cases, individual rows from the Input
are independent, meaning that Output[i]
is calculated only from Input[i]
and doesn't know other inputs (typical case: batch processing without batchnorm). If that is your case, then dO
is going to give you all individual components dO_i
at once.
This is because each dO_i
matrix is going to look like this:
[[ 0. 0. 0.]
[ 0. 0. 0.]
...
[ 0. 0. 0.]
[ xxx xxx xxx] <- i-th row
[ 0. 0. 0.]
...
[ 0. 0. 0.]]
All rows are going to be 0
, except for the i
-th one. So just by computing one matrix dO
, you can easily get every dO_i
. This is very efficient.
However, if that's not your case and all Output[i]
depend on all inputs, there's no way to extract individual dO_i
just from their sum. You have no other choice other than calculate each gradient separately: just iterate over i
and execute tf.gradients
.
Upvotes: 1