Torben
Torben

Reputation: 345

Tensorflow: Gradient Calculation from Input to Output

I would like to calculate the gradients of the output of a neural network with respect to the input. I have the following tensors:

Input: (num_timesteps, features)

Output: (num_timesteps, 1)

For the gradients from the inputs to the entire output vector I can use the following:

tf.gradients(Output, Input)

Since I would like to compute the gradients for every single timesample I would like to calculate

tf.gradients(Output[i], Input)

for every i.

What is the best way to do that?

Upvotes: 1

Views: 1643

Answers (1)

Maxim
Maxim

Reputation: 53758

First up, I suppose you mean the gradient of Output with respect to the Input.

Now, the result of both of these calls:

  • dO = tf.gradients(Output, Input)
  • dO_i = tf.gradients(Output[i], Input) (for any valid i)

will be a list with a single element - a tensor with the same shape as Input, namely a [num_timesteps, features] matrix. Also, if you sum all matrices dO_i (over all valid i) is exactly the matrix dO.

With this in mind, back to your question. In many cases, individual rows from the Input are independent, meaning that Output[i] is calculated only from Input[i] and doesn't know other inputs (typical case: batch processing without batchnorm). If that is your case, then dO is going to give you all individual components dO_i at once.

This is because each dO_i matrix is going to look like this:

[[  0.   0.   0.]
 [  0.   0.   0.]
 ...
 [  0.   0.   0.]
 [ xxx  xxx  xxx]     <- i-th row
 [  0.   0.   0.]
 ...
 [  0.   0.   0.]]

All rows are going to be 0, except for the i-th one. So just by computing one matrix dO, you can easily get every dO_i. This is very efficient.

However, if that's not your case and all Output[i] depend on all inputs, there's no way to extract individual dO_i just from their sum. You have no other choice other than calculate each gradient separately: just iterate over i and execute tf.gradients.

Upvotes: 1

Related Questions