Sam Bobel
Sam Bobel

Reputation: 1824

Will using multiple minimizing ops at once work as expected in Tensorflow?

For example, if I do:

loss_one = something
loss_two = somthing_else
train_one = tf.train.AdamOptimzer(0.001).minimize(loss_one)
train_two = tf.train.AdamOptimizer(0.001).minimize(loss_two)
sess.run([train_one, train_two])

Will that do what's expected? The reason I'm concerned is because I don't exactly know how gradients are accumulated. Are they stored on the optimizers themselves? Or on the variables? If it's the second, I can imagine them interfering.

Upvotes: 1

Views: 935

Answers (2)

javidcf
javidcf

Reputation: 59731

Most likely not. Presumably, both loss_one and loss_two are a measure of how close the output of your model, let's say out, is to what you expected, so they would both be a function of out and maybe something else. Both optimizers compute the variable updates from the out computed with the values that the variables had before calling session.run. So if you apply one update and then the other, the second update would not be really correct, because it has not been computed using the now-updated variables. This may not be a huge issue though, since. A more complicated problem is that, depending on how exactly the optimizer is implemented, if it is something more or less like variable = variable + update then it is not deterministic whether that variable on the right-hand side of the expression has the original or first-updated value, so you could end adding only one of the updates or both, non-deterministically.

There are several better alternatives:

  • Use only one optimizer at a time, so you call sess.run(train_one) first and sess.run(train_two) later.
  • Optimize the (possibly weighted) sum of both losses (tf.train.AdamOptimzer(0.001).minimize(loss_one + loss_two)).
  • Call compute_gradients from the optimizer for each loss value, combine the resulting gradients however you see fit (e.g. adding or averaging the updates) and apply them with apply_gradients.
  • Use tf.control_dependencies to make sure that one optimization step always takes place after the other. However this means that using the second optimizer will always require using the first one (could be work around, maybe with tf.cond, but it's more of a hassle).

Upvotes: 1

Eliethesaiyan
Eliethesaiyan

Reputation: 2322

the optimizer is mainly in charge of calculating the gradients(backpropagation), if you give it loss twice(run it two times as you are doing), it will update the gradients twice by performing inference once.not sure why would you do that though

Upvotes: 0

Related Questions