Reputation: 89
In my model, I'm using tf.keras.losses.MSE to calculate the mean squared error of my BATCH_SIZE x 256 x 256 x 3 output and my BATCH_SIZE x 256 x 256 x 3 input.
The output of this function appears to be (None,256,256).
I then use tf.GradientTape.gradients, with the MSE output as the "target" attribute. In the documentation, it says that this attribute can be a tensor.
My understanding is that loss is a scalar number which is differentiated against each of the weights during backpropagation.
Therefore, my question is: What happens when a multi-dimensional tensor is passed into the gradients function? Is the sum of all elements in the tensor simple calculated?
I ask this because my model is not training at the moment, with loss reading at 1.0 at every epoch. My assumption is that I am not calculating the gradients correctly, as all my gradients are reading as 0.0 for each weight.
Upvotes: 0
Views: 1105
Reputation: 791
import tensorflow as tf
x = tf.Variable([3.0, 2.0])
with tf.GradientTape() as g:
g.watch(x)
y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
print(y)
Result:
tf.Tensor([6. 4.], shape=(2,), dtype=float32)
tf.Tensor([9. 4.], shape=(2,), dtype=float32)
As described in the figure above, tf.GradientTape.gradient
simply calculates the gradient dy/dx. In your case with multiple variables, tf seems to calculate the derivative of the corresponding tensor instead of automatically summing them.
Upvotes: 0