In Tensorflow, what does tf.GradientTape.gradients do when its "target" attribute is a multi-dimensional tensor?

Question

In my model, I'm using tf.keras.losses.MSE to calculate the mean squared error of my BATCH_SIZE x 256 x 256 x 3 output and my BATCH_SIZE x 256 x 256 x 3 input.

The output of this function appears to be (None,256,256).

I then use tf.GradientTape.gradients, with the MSE output as the "target" attribute. In the documentation, it says that this attribute can be a tensor.

My understanding is that loss is a scalar number which is differentiated against each of the weights during backpropagation.

Therefore, my question is: What happens when a multi-dimensional tensor is passed into the gradients function? Is the sum of all elements in the tensor simple calculated?

I ask this because my model is not training at the moment, with loss reading at 1.0 at every epoch. My assumption is that I am not calculating the gradients correctly, as all my gradients are reading as 0.0 for each weight.

krenerd · Accepted Answer

import tensorflow as tf
x = tf.Variable([3.0, 2.0])
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
print(y)

Result: 
tf.Tensor([6. 4.], shape=(2,), dtype=float32)
tf.Tensor([9. 4.], shape=(2,), dtype=float32)

As described in the figure above, tf.GradientTape.gradient simply calculates the gradient dy/dx. In your case with multiple variables, tf seems to calculate the derivative of the corresponding tensor instead of automatically summing them.

In Tensorflow, what does tf.GradientTape.gradients do when its "target" attribute is a multi-dimensional tensor?

Answers (1)

Related Questions

In Tensorflow, what does tf.GradientTape.gradients do when its &quot;target&quot; attribute is a multi-dimensional tensor?

Answers (1)

Related Questions

In Tensorflow, what does tf.GradientTape.gradients do when its "target" attribute is a multi-dimensional tensor?