magmacollaris
magmacollaris

Reputation: 89

In Tensorflow, what does tf.GradientTape.gradients do when its "target" attribute is a multi-dimensional tensor?

In my model, I'm using tf.keras.losses.MSE to calculate the mean squared error of my BATCH_SIZE x 256 x 256 x 3 output and my BATCH_SIZE x 256 x 256 x 3 input.

The output of this function appears to be (None,256,256).

I then use tf.GradientTape.gradients, with the MSE output as the "target" attribute. In the documentation, it says that this attribute can be a tensor.

My understanding is that loss is a scalar number which is differentiated against each of the weights during backpropagation.

Therefore, my question is: What happens when a multi-dimensional tensor is passed into the gradients function? Is the sum of all elements in the tensor simple calculated?

I ask this because my model is not training at the moment, with loss reading at 1.0 at every epoch. My assumption is that I am not calculating the gradients correctly, as all my gradients are reading as 0.0 for each weight.

Upvotes: 0

Views: 1105

Answers (1)

krenerd
krenerd

Reputation: 791

import tensorflow as tf
x = tf.Variable([3.0, 2.0])
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
print(y)

Result: 
tf.Tensor([6. 4.], shape=(2,), dtype=float32)
tf.Tensor([9. 4.], shape=(2,), dtype=float32)

As described in the figure above, tf.GradientTape.gradient simply calculates the gradient dy/dx. In your case with multiple variables, tf seems to calculate the derivative of the corresponding tensor instead of automatically summing them.

Upvotes: 0

Related Questions