Reputation: 5763
I am training a neural network that has mean teachers incorporated into it. The process is as follows:
Take a supervised architecture and make a copy of it. Let's call the original model the student and the new one the teacher.
At each training step, use the same minibatch as inputs to both the student and the teacher but add random augmentation or noise to the inputs separately. Add an additional consistency cost between the student and teacher outputs (after softmax).
Let the optimizer update the student weights normally.
Let the teacher weights be an exponential moving average (EMA) of the student weights. That is, after each training step, update the teacher weights a little bit toward the student weights.
Also, tensorflow documentation says the EMA variables are created with (trainable=False) and added to the GraphKeys.ALL_VARIABLES collection. Now as they are not trainable they wont have the gradient applied on them, i understand that. But, as they depend upon the current trainable variables of the graph, and hene so do the predictions of the teacher network; will an additional gradient flow to the trainable variables because of ema being dependent upon them? In general, do non-trainable variables pass the gradients through them?
Upvotes: 2
Views: 776
Reputation: 11968
Yes. TLDR: everything that goes into the loss will generate gradients.
The flow is like this:
If the variable is not trainable then it's not adjusted, but gradients are still propagated.
will an additional gradient flow to the trainable variables because of ema being dependent upon them?
Only computing ema based on other things in your graph, it won't change the gradients. If however the result is incorporated into the loss then it will generate gradients and will propagate more gradients to optimize the loss.
Upvotes: 1