Reputation: 12434
If an assign operation is applied to a weight tensor after that weight tensor is used in its portion of the forward pass of a network, does TensorFlow's backpropagation take into account the assign operation when determining the gradient for that weight? For example, if I have
weights = tf.Variable(...)
bias = tf.Variable(...)
output = tf.tanh(tf.matmul(weights, input) + bias)
weight_assign_op = weights.assign(weights + 1.0)
with tf.control_dependencies(weight_assign_op):
output2 = tf.identity(output)
the output is calculated, and then a change is made to the weights. If the output is then used to calculate a loss and gradients to update the variables, will the gradients be created taking into account the change to weights
? That is, will the gradients for weights
be the correct gradients for old_weights + 1.0
or will they still be the gradients for old_weights
which when applied to the new weights
won't necessarily be "correct" gradients for gradient descent?
Upvotes: 2
Views: 355
Reputation: 12434
I ended up testing it experimentally. The gradient calculation does take the assign op into account. I used the below code to test. Running it as it results in a positive gradient. Commenting out the weight assign op line and the control dependency lines results in a negative gradient. This is because the gradient is either being considered for the original starting value weight of 0.0
or of the updated weight after the assign of 2.0
.
import tensorflow as tf
data = [[1.0], [2.0], [3.0]]
labels = [[1.0], [2.1], [2.9]]
input_data = tf.placeholder(dtype=tf.float32, shape=[3, 1])
input_labels = tf.placeholder(dtype=tf.float32, shape=[3, 1])
weights = tf.Variable(tf.constant([0.0]))
bias = tf.Variable(tf.constant([0.0]))
output = (weights * input_data) + bias
weight_assign_op = weights.assign(tf.constant([2.0]))
with tf.control_dependencies([weight_assign_op]):
output = tf.identity(output)
loss = tf.reduce_sum(tf.norm(output - input_labels))
weight_gradient = tf.gradients(loss, weights)
initialize_op = tf.global_variables_initializer()
session = tf.Session()
session.run([initialize_op])
weight_gradient_value = session.run([weight_gradient], feed_dict={input_data: data, input_labels: labels})
print(weight_gradient_value)
Upvotes: 1