Reputation: 577
I have been learning tensorflow 2.0 these days. I wrote a very simple model for testing. Specifically, I want to minimize the function x1^2-2x1+1, which reaches the optimal when x1 = 1. Instead of creating only one variable x1, I created any other variable x2 = 2x1 + 1 to see does it work if I have complicated relationships between different variables in the future. Here is my code:
import tensorflow as tf
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
var1 = tf.Variable(tf.random.normal([1]))
var2 = tf.add(tf.multiply(-2, var1), 1)
loss = lambda: var1 * var1 + var2
for i in range(1000):
opt.minimize(loss, var_list=[var1])
print('var1: {}, var2: {}'.format(var1.numpy(), var2.numpy()))
variable var1 quickly converges to 0, while var2 remain unchanged. So, where is the problem in my code?
Upvotes: 1
Views: 385
Reputation: 59701
The problem is that you are writing code as if you were in graph mode (TF 1.x). When you write the line:
var2 = tf.add(tf.multiply(-2, var1), 1)
var2
will be assigned a value (the initial random value of var1
times two plus one), and then it does not change anymore. Unlike in graph mode, where var2
would represent the symbolic computation -2 * var1 + 1
, in eager mode it is just a value computed at the time that line of code is evaluated. This means that your loss function is really just computing var1
squared plus some constant, so the minimum is always reached when var1
equals zero.
In TF 2.x you have to do the computation of the loss on each training iteration, instead of expressing it symbolically once before the training loop as in TF 1.x. So, the computation of var2
would have to be done within the loss
function, for every new value of var1
.
import tensorflow as tf
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
var1 = tf.Variable(tf.random.normal([1]))
def loss():
var2 = tf.add(tf.multiply(-2, var1), 1)
return var1 * var1 + var2
for i in range(1000):
opt.minimize(loss, var_list=[var1])
print('var1: {}'.format(var1.numpy()))
# ...
# var1: [0.9999999]
Upvotes: 1