Gu Liqi
Gu Liqi

Reputation: 31

Tensorflow GradientTape does not trace optimizer.apply_gradients?

import tensorflow as tf

def f(x):
    return tf.multiply(x, x)

x = tf.Variable([3.])

with tf.GradientTape() as test_tape:
    test_tape.watch(x)

    with tf.GradientTape() as train_tape:
        train_tape.watch(x)
        fx = f(x)

    gradient = train_tape.gradient(fx, x)  # df(x)/x = d(x^2)/dx = 2x
    x_prime = x.__copy__()  # x' = x
    x_prime = tf.subtract(x_prime, tf.multiply(gradient, 0.01))  # x' = x' - 0.01 * 2x = 0.98x
    fx_prime = f(x_prime)

gradient = test_tape.gradient(fx_prime, x)  # df(x')/dx = df(0.98x)/dx = 1.9208 * x = 5.7624
print(gradient)

I'm learning tensorflow2.0 GradientTape() and testing this code, which calculate a second derivative d(x-0.01*df(x)/dx)/dx. Given x = 3 and f(x) = x*x, the result is 5.7624. And the code above get the right answer. Then I tried to replace the line

x_prime = tf.subtract(x_prime, tf.multiply(gradient, 0.01))

by

optimizer = tf.optimizers.SGD()
optimizer.apply_gradients(zip([gradient], [x_prime]))

And got the wrong answer 5.88, I can't get this around and guess GradientTape does not trace apply_gradients? Does anybody know why?

python-3.7, tensorflow-2.0.0

Upvotes: 2

Views: 230

Answers (1)

Gu Liqi
Gu Liqi

Reputation: 31

OK, I get the answer by myself. optimizer.apply_gradients operation does not generate a node in the Graph, it just changes the value in the original memory space, so there will be no connection between x_prime and previous nodes. Besides, some other operations or functions also does not work in GradientTape, like tf.Varaible().assign(), .assign_add(), .assign_sub(), tf.keras.Layers.Layer.set_weights(), .etc.

Upvotes: 1

Related Questions