yoki
yoki

Reputation: 1816

Breaking TensorFlow gradient calculation into two (or more) parts

Is it possible to use TensorFlow's tf.gradients() function in parts, that is - calculate the gradient from of loss w.r.t some tensor, and of that tensor w.r.t the weight, and then multiply them to get the original gradient from the loss to the weight?

For example, let W,b be some weights, let x be an input of a network, and let y0 denote labels.

Assume a forward graph such as

h=Wx+b
y=tanh(h)
loss=mse(y-y0)

We can calculate tf.gradients(loss,W) and then apply (skipping some details) optimizer.apply_gradients() to update W.

I then try to extract an intermediate tensor, by using var=tf.get_default_graph().get_tensor_by_name(...), and then calculate two gradients: g1=tf.gradients(loss,var) and g2=tf.gradients(var,W). I would then, by the chain rule, expect the dimensions of g1 and g2 to work out so that I can write g=g1*g2 in some sense, and get back tf.gradients(loss,W).

Unfortunately, this is not the case. The dimensions are incorrect. Each gradient's dimensions will be that of the "w.r.t variable", so there won't be a correspondence between the first gradient and the second one. What am I missing, and how can I do this?

Thanks.

Upvotes: 2

Views: 1352

Answers (3)

akonsk
akonsk

Reputation: 76

for future readers:

Tensorflow has made some advancements, and as for tf2.7 (and maybe even earlier versions) you can use tf.GradientTape.jacobian to avoid the sum over the target's dimensions.

https://www.tensorflow.org/guide/advanced_autodiff#jacobians

Upvotes: 0

BlueSun
BlueSun

Reputation: 3570

tf.gradients will sum over the gradients of the input tensor. To avoid it you have to split the tensor into scalars and apply tf.gradients to each of them:

import tensorflow as tf

x = tf.ones([1, 10])

w = tf.get_variable("w", initializer=tf.constant(0.5, shape=[10, 5]))
out = tf.matmul(x, w)
out_target = tf.constant(0., shape=[5])

loss = tf.reduce_mean(tf.square(out - out_target))

grad = tf.gradients(loss, x)

part_grad_1 = tf.gradients(loss, out)
part_grad_2 = tf.concat([tf.gradients(i, x) for i in tf.split(out, 5, axis=1)], axis=1)

grad_by_parts = tf.matmul(part_grad_1, part_grad_2)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run([grad]))
    print(sess.run([grad_by_parts]))

Upvotes: 1

P-Gn
P-Gn

Reputation: 24581

From the docs, tf.gradients (emphasis mine)

constructs symbolic derivatives of sum of ys w.r.t. x in xs.

If any tensor in ys in multidimensional, it is reduce_summed before the resulting list of scalar is itself summed, before being differenciated. This is why the output gradient has the same size as the xs.

This also explain why losses can be multidimensional in tensorflow: they are implicitely summed over before differentiation.

Upvotes: 0

Related Questions