wfjiwi19024
wfjiwi19024

Reputation: 13

Tensorflow update only selected variables

Overview: I want to update only selected variables in a network. The network has parts A->B (in forward direction) and each of them has separate losses La and Lb. I want to train the weights a of A to optimize Lb. While doing this, the weights b of B should be fixed. How can I do this?

Approach 1: Select only a as variables to minimize using var_list in optimizer.minimize(loss, var_list=[a]). https://github.com/tensorflow/tensorflow/issues/834 . This crashes with an error ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables (...) and loss (...). This actually works fine in other scenarios, but apparently it does not like that weights b are not in the var_list.

Edit 1: The line that causes the error: a_optim = tf.train.AdamOptimizer(args.lr, beta1=args.beta1).minimize(self.a_loss, var_list=self.a_vars, global_step=self.global_step)

Approach 2: Same as Approach 1, but also include b in the var_list. The problem is now that the network updates a and b, whereas it should just send the gradients through B and only update A.

Edit 2: The line that works, but is not what I want: a_optim = tf.train.AdamOptimizer(args.lr, beta1=args.beta1).minimize(self.a_loss, var_list=self.a_vars+self.b_vars, global_step=self.global_step)

Approach 3: Use tf.stop_gradient(tensor) Holding variables constant during optimizer . From the documentation I infer that this only stops the gradients from flowing further to the left in the graph. I want the ignore variables on the right.

Approach 4: Set tf.Variable(..., trainable=True), but that looks very inflexible if I want to alternate training between A and B.

Upvotes: 1

Views: 2749

Answers (2)

unki
unki

Reputation: 1024

I found that, for a better control of which variables to update during the optimization, it is better to use: 'compute_gradients' and 'apply_gradients' approach.

The compute_gradients will return a list of tuple of gradients and variables tensors. You can modify the returning gradient tensors whatever you want and also be able to select the subset of variables for updating.

Then, you pass a list of tuple of gradients and variables that you want to update to 'apply_gradients'

Here are some examples:

optimizer = tf.train.AdamOptimizer(learning_rate=0.0001)
grads = optimizer.compute_gradients(your_cost_function)

# You can update 'g' and exclude some v's 
grad_lists = [(g, v) for g, v in grads]

train_op = optimizer.apply_gradients(grad_lists)

Then, run your session.

sess.run(train_op, feed_dict={...})

Also, since you have 2 loss functions, you should create 2 train operations.

Hope this help!

Upvotes: 2

wfjiwi19024
wfjiwi19024

Reputation: 13

It turns out that the final op in A was non-differentiable (tf_argmax) and therefore obviously gradients could not be passed from B to A.

Upvotes: 0

Related Questions