Reputation: 13025

zip object for opt.apply_gradients

There is a program including an optimiziton function, which has following code segment to compute gradient

if hypes['clip_norm'] > 0:
        grads, tvars = zip(*grads_and_vars)
        clip_norm = hypes["clip_norm"]
        clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
        grads_and_vars = zip(clipped_grads, tvars)
        print('grads_and_vars ',grads_and_vars)

    train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    with tf.control_dependencies(update_ops):
        train_op = (grads_and_vars,
                                       global_step=global_step)

However, running the program raises the following error

File "/home/FCN/kittiseg/hypes/../optimizer/generic_optimizer.py", line 92, in training
    train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)
  File "tensorflow/tf_0.12/lib/python3.4/site-packages/tensorflow/python/training/optimizer.py", line 370, in apply_gradients
    raise ValueError("No variables provided.")
ValueError: No variables provided.

I digged into the code, and think it is caused by the variable grads_and_var. I printed it out, which is just grads_and_vars <zip object at 0x2b0d6c27e348>. But I don't know how to analyze it and what causes the train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)

fail?

This is the original training function

def training(hypes, loss, global_step, learning_rate, opt=None):
"""Sets up the training Ops.

Creates a summarizer to track the loss over time in TensorBoard.

Creates an optimizer and applies the gradients to all trainable variables.

The Op returned by this function is what must be passed to the
`sess.run()` call to cause the model to train.

Args:
  loss: Loss tensor, from loss().
  global_step: Integer Variable counting the number of training steps
    processed.
  learning_rate: The learning rate to use for gradient descent.

Returns:
  train_op: The Op for training.
"""
# Add a scalar summary for the snapshot loss.''
sol = hypes["solver"]
hypes['tensors'] = {}
hypes['tensors']['global_step'] = global_step
total_loss = loss['total_loss']
with tf.name_scope('training'):

    if opt is None:

        if sol['opt'] == 'RMS':
            opt = tf.train.RMSPropOptimizer(learning_rate=learning_rate,
                                            decay=0.9,
                                            epsilon=sol['epsilon'])
        elif sol['opt'] == 'Adam':
            opt = tf.train.AdamOptimizer(learning_rate=learning_rate,
                                         epsilon=sol['adam_eps'])
        elif sol['opt'] == 'SGD':
            lr = learning_rate
            opt = tf.train.GradientDescentOptimizer(learning_rate=lr)
        else:
            raise ValueError('Unrecognized opt type')

    hypes['opt'] = opt

    grads_and_vars = opt.compute_gradients(total_loss)

    if hypes['clip_norm'] > 0:
        grads, tvars = zip(*grads_and_vars)
        clip_norm = hypes["clip_norm"]
        clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
        grads_and_vars = zip(clipped_grads, tvars)

    train_op = opt.apply_gradients(grads_and_vars, global_step=global_step)

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    with tf.control_dependencies(update_ops):
        train_op = opt.apply_gradients(grads_and_vars,
                                       global_step=global_step)

return train_op

Upvotes: 0

Answers (2)

Chulayuth Asawaroengchai

Reputation: 106

I believe that tf.clip_by_value have the different effect to the gradient values from tf.clip_by_global_norm.

Apparently tf.clip_by_value clips each gradient values independently into the clip range, while tf.clip_by_global_norm calculates total norm of all gradient values and rescale each value in the way that every gradient values will fit into the clip range, while preserve proportion between every gradient values.

To illustrate the different between the two functions, let's say we have

original gradients = [2.0, 1.0, 2.0]

tf.clip_by_value(gradients, -1.0, 1.0) will cause gradients to be [1.0, 1.0, 1.0]

tf.clip_by_global_norm(gradient, 1.0) will cause gradients to be [1.0, 0.5, 1.0]

To answer the original question, what works for me is that I have to convert zip object to list as below:

grads, tvars = zip(*grads_and_vars)
(clipped_grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)
grads_and_vars = list(zip(clipped_grads, tvars))

Upvotes: 0

dominik andreas

Reputation: 164

Seems like a bug in the gradient clipping section. I had the same problem, did some research on how to do it properly (see source below) and it seems to work now.

replace the section

grads, tvars = zip(*grads_and_vars)
clip_norm = hypes["clip_norm"]
clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
grads_and_vars = zip(clipped_grads, tvars)

with

clip_norm = hypes["clip_norm"]
grads_and_vars = [(tf.clip_by_value(grad, -clip_norm, clip_norm), var) 
                                       for grad, var in grads_and_vars]

and it should work.

source: How to effectively apply gradient clipping in tensor flow?

Upvotes: 1

zip object for opt.apply_gradients

Answers (2)

Related Questions