figs_and_nuts
figs_and_nuts

Reputation: 5763

Do gradients flow through non-trainable variables in tensorflow?

I am training a neural network that has mean teachers incorporated into it. The process is as follows:

  1. Take a supervised architecture and make a copy of it. Let's call the original model the student and the new one the teacher.

  2. At each training step, use the same minibatch as inputs to both the student and the teacher but add random augmentation or noise to the inputs separately. Add an additional consistency cost between the student and teacher outputs (after softmax).

  3. Let the optimizer update the student weights normally.

  4. Let the teacher weights be an exponential moving average (EMA) of the student weights. That is, after each training step, update the teacher weights a little bit toward the student weights.

Also, tensorflow documentation says the EMA variables are created with (trainable=False) and added to the GraphKeys.ALL_VARIABLES collection. Now as they are not trainable they wont have the gradient applied on them, i understand that. But, as they depend upon the current trainable variables of the graph, and hene so do the predictions of the teacher network; will an additional gradient flow to the trainable variables because of ema being dependent upon them? In general, do non-trainable variables pass the gradients through them?

Upvotes: 2

Views: 776

Answers (1)

Sorin
Sorin

Reputation: 11968

Yes. TLDR: everything that goes into the loss will generate gradients.

The flow is like this:

  • compute loss
  • figure out the gradients. Usually that is to lower the loss. The gradients are backpropagated through your model.
  • Take trainable variables and adjust them based on the gradient/ optimization algorithm.

If the variable is not trainable then it's not adjusted, but gradients are still propagated.

will an additional gradient flow to the trainable variables because of ema being dependent upon them?

Only computing ema based on other things in your graph, it won't change the gradients. If however the result is incorporated into the loss then it will generate gradients and will propagate more gradients to optimize the loss.

Upvotes: 1

Related Questions