Reputation: 13025
There is a program including an optimiziton function, which has following code segment to compute gradient
if hypes['clip_norm'] > 0:
grads, tvars = zip(*grads_and_vars)
clip_norm = hypes["clip_norm"]
clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
grads_and_vars = zip(clipped_grads, tvars)
print('grads_and_vars ',grads_and_vars)
train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = (grads_and_vars,
global_step=global_step)
However, running the program raises the following error
File "/home/FCN/kittiseg/hypes/../optimizer/generic_optimizer.py", line 92, in training
train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
File "tensorflow/tf_0.12/lib/python3.4/site-packages/tensorflow/python/training/optimizer.py", line 370, in apply_gradients
raise ValueError("No variables provided.")
ValueError: No variables provided.
I digged into the code, and think it is caused by the variable grads_and_var
. I printed it out, which is just grads_and_vars <zip object at 0x2b0d6c27e348>
. But I don't know how to analyze it and what causes the
train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
fail?
This is the original training function
def training(hypes, loss, global_step, learning_rate, opt=None):
"""Sets up the training Ops.
Creates a summarizer to track the loss over time in TensorBoard.
Creates an optimizer and applies the gradients to all trainable variables.
The Op returned by this function is what must be passed to the
`sess.run()` call to cause the model to train.
Args:
loss: Loss tensor, from loss().
global_step: Integer Variable counting the number of training steps
processed.
learning_rate: The learning rate to use for gradient descent.
Returns:
train_op: The Op for training.
"""
# Add a scalar summary for the snapshot loss.''
sol = hypes["solver"]
hypes['tensors'] = {}
hypes['tensors']['global_step'] = global_step
total_loss = loss['total_loss']
with tf.name_scope('training'):
if opt is None:
if sol['opt'] == 'RMS':
opt = tf.train.RMSPropOptimizer(learning_rate=learning_rate,
decay=0.9,
epsilon=sol['epsilon'])
elif sol['opt'] == 'Adam':
opt = tf.train.AdamOptimizer(learning_rate=learning_rate,
epsilon=sol['adam_eps'])
elif sol['opt'] == 'SGD':
lr = learning_rate
opt = tf.train.GradientDescentOptimizer(learning_rate=lr)
else:
raise ValueError('Unrecognized opt type')
hypes['opt'] = opt
grads_and_vars = opt.compute_gradients(total_loss)
if hypes['clip_norm'] > 0:
grads, tvars = zip(*grads_and_vars)
clip_norm = hypes["clip_norm"]
clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
grads_and_vars = zip(clipped_grads, tvars)
train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = opt.apply_gradients(grads_and_vars,
global_step=global_step)
return train_op
Upvotes: 0
Views: 1539
Reputation: 106
I believe that tf.clip_by_value have the different effect to the gradient values from tf.clip_by_global_norm.
Apparently tf.clip_by_value clips each gradient values independently into the clip range, while tf.clip_by_global_norm calculates total norm of all gradient values and rescale each value in the way that every gradient values will fit into the clip range, while preserve proportion between every gradient values.
To illustrate the different between the two functions, let's say we have
original gradients = [2.0, 1.0, 2.0]
tf.clip_by_value(gradients, -1.0, 1.0) will cause gradients to be [1.0, 1.0, 1.0]
tf.clip_by_global_norm(gradient, 1.0) will cause gradients to be [1.0, 0.5, 1.0]
To answer the original question, what works for me is that I have to convert zip object to list as below:
grads, tvars = zip(*grads_and_vars)
(clipped_grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)
grads_and_vars = list(zip(clipped_grads, tvars))
Upvotes: 0
Reputation: 164
Seems like a bug in the gradient clipping section. I had the same problem, did some research on how to do it properly (see source below) and it seems to work now.
replace the section
grads, tvars = zip(*grads_and_vars)
clip_norm = hypes["clip_norm"]
clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
grads_and_vars = zip(clipped_grads, tvars)
with
clip_norm = hypes["clip_norm"]
grads_and_vars = [(tf.clip_by_value(grad, -clip_norm, clip_norm), var)
for grad, var in grads_and_vars]
and it should work.
source: How to effectively apply gradient clipping in tensor flow?
Upvotes: 1