Reputation: 1824
For example, if I do:
loss_one = something
loss_two = somthing_else
train_one = tf.train.AdamOptimzer(0.001).minimize(loss_one)
train_two = tf.train.AdamOptimizer(0.001).minimize(loss_two)
sess.run([train_one, train_two])
Will that do what's expected? The reason I'm concerned is because I don't exactly know how gradients are accumulated. Are they stored on the optimizers themselves? Or on the variables? If it's the second, I can imagine them interfering.
Upvotes: 1
Views: 935
Reputation: 59731
Most likely not. Presumably, both loss_one
and loss_two
are a measure of how close the output of your model, let's say out
, is to what you expected, so they would both be a function of out
and maybe something else. Both optimizers compute the variable updates from the out
computed with the values that the variables had before calling session.run
. So if you apply one update and then the other, the second update would not be really correct, because it has not been computed using the now-updated variables. This may not be a huge issue though, since. A more complicated problem is that, depending on how exactly the optimizer is implemented, if it is something more or less like variable = variable + update
then it is not deterministic whether that variable
on the right-hand side of the expression has the original or first-updated value, so you could end adding only one of the updates or both, non-deterministically.
There are several better alternatives:
sess.run(train_one)
first and sess.run(train_two)
later.tf.train.AdamOptimzer(0.001).minimize(loss_one + loss_two)
).compute_gradients
from the optimizer for each loss value, combine the resulting gradients however you see fit (e.g. adding or averaging the updates) and apply them with apply_gradients
.tf.control_dependencies
to make sure that one optimization step always takes place after the other. However this means that using the second optimizer will always require using the first one (could be work around, maybe with tf.cond
, but it's more of a hassle).Upvotes: 1
Reputation: 2322
the optimizer is mainly in charge of calculating the gradients(backpropagation), if you give it loss twice(run it two times as you are doing), it will update the gradients twice by performing inference once.not sure why would you do that though
Upvotes: 0