kashkar
kashkar

Reputation: 663

TensorFlow: How can I reuse Adam optimizer variables?

After recently upgrading my TensorFlow version, I am encountering this error which I am not able to solve:

Traceback (most recent call last):
  File "cross_train.py", line 177, in <module>
    train_network(use_gpu=True)
  File "cross_train.py", line 46, in train_network
    with tf.control_dependencies([s_opt.apply_gradients(s_grads), s_increment_step]):

...

ValueError: Variable image-conv1-layer/weights/Adam/ already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "cross_train.py", line 34, in train_network
    with tf.control_dependencies([e_opt.apply_gradients(e_grads), e_increment_step]):
  File "cross_train.py", line 177, in <module>
    train_network(use_gpu=True)

My model architecture is 3 different convolutional neural network branches: M, E, S. In training, I am trying to alternate steps where I propagate samples through M & E (dot product distance of their embeddings) and update with Adam; then propagate samples through M & S and update with Adam; and repeat. So basically M is fixed (getting updated every step), but E and S branches alternate getting updated.

As such I created two instances of AdamOptimizer (e_opt and s_opt) but I get the error because the weight variable M-conv1/weights/Adam/ already exists when I try to update the S branch.

This was not happening to me before I updated my TensorFlow version. I know how to set reuse of variables generally in TensorFlow, for example:

with tf.variable_scope(name, values=[input_to_layer]) as scope:
    try:
        weights = tf.get_variable("weights", [height, width, input_to_layer.get_shape()[3], channels], initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
        bias = tf.get_variable("bias", [channels], initializer=tf.constant_initializer(0.0, dtype=tf.float32))
    except ValueError:
        scope.reuse_variables()
        weights = tf.get_variable("weights", [height, width, input_to_layer.get_shape()[3], channels], initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
        bias = tf.get_variable("bias", [channels], initializer=tf.constant_initializer(0.0, dtype=tf.float32))

But I'm not sure if I can do the same for Adam. Any ideas? Help would be much appreciated.

Upvotes: 4

Views: 2809

Answers (1)

kashkar
kashkar

Reputation: 663

Turns out I didn't need to instantiate two different Adam optimizers. I just created a single instance and there was no name conflict or issue of trying to share variables. I use the same optimizer regardless of which network branches are being updated:

    e_grads = opt.compute_gradients(e_loss)
with tf.control_dependencies([opt.apply_gradients(e_grads), e_increment_step]):
    e_train = tf.no_op(name='english_train') 

and...

    s_grads = opt.compute_gradients(s_loss)
with tf.control_dependencies([opt.apply_gradients(s_grads), s_increment_step]):
    s_train = tf.no_op(name='spanish_train')

Interestingly with the older version of Tensorflow there was no issue with using two Adam instances even though the M branch names conflicted...

Upvotes: 3

Related Questions