Do gradients flow through non-trainable variables in tensorflow?

Question

I am training a neural network that has mean teachers incorporated into it. The process is as follows:

Take a supervised architecture and make a copy of it. Let's call the original model the student and the new one the teacher.
At each training step, use the same minibatch as inputs to both the student and the teacher but add random augmentation or noise to the inputs separately. Add an additional consistency cost between the student and teacher outputs (after softmax).
Let the optimizer update the student weights normally.
Let the teacher weights be an exponential moving average (EMA) of the student weights. That is, after each training step, update the teacher weights a little bit toward the student weights.

Also, tensorflow documentation says the EMA variables are created with (trainable=False) and added to the GraphKeys.ALL_VARIABLES collection. Now as they are not trainable they wont have the gradient applied on them, i understand that. But, as they depend upon the current trainable variables of the graph, and hene so do the predictions of the teacher network; will an additional gradient flow to the trainable variables because of ema being dependent upon them? In general, do non-trainable variables pass the gradients through them?

Do gradients flow through non-trainable variables in tensorflow?

Answers (1)

Related Questions