Reputation: 33
I'd like to get a better understanding of the parameter training, when calling a Keras model.
In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):
pred = model(x, training=True)
and when you want to do inference, you should set training to false:
pred = model(x, training=False)
What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False
(because I plan on freezing it like in this tutorial here):
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
What will in such a case happen, when I later on call new_model(x_new, training=True)
? Will the usage of training=False
for the base_model
be overruled? Or will training now allways be set to True for the base_model
, regardless of what I pass to the new_model
? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True)
, that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)
?
Thanks in advance!
Upvotes: 3
Views: 2027
Reputation: 1066
Actually the priority order of training
is documented in keras:
# Training mode for `Layer.call` is set via (in order of priority): # (1) The `training` argument passed to this `Layer.call`, if it is not None # (2) The training mode of an outer `Layer.call`. # (3) The default mode set by `tf.keras.backend.set_learning_phase` (if set) # (4) Any non-None default value for `training` specified in the call # signature # (5) False (treating the layer as if it's in inference)
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/base_layer.py
And the scenario you described i.e. setting training
by functional API belongs to case (1). Just in case, we can do a simple test.
import tensorflow as tf
from tensorflow.keras.layers import Dropout
dropout = Dropout(0.99999)
x = [[1.0]]
dropout(x, training=True)
#0
dropout(x)
#1
The default training
of Dropout layer is None
. Obviously, layer dropout cannot remember its last training
argument.
What about putting it in a model?
inputs = tf.keras.Input((1,))
outputs = dropout(inputs)
model = tf.keras.Model(inputs, outputs)
model(x, training=True)
#0
model(x, training=False)
#1
Without setting training
manually, the behavior of dropout follows the training
of the model. Try setting training
when building the model:
outputs = dropout(inputs, training=True)
model = tf.keras.Model(inputs, outputs)
model(x, training=False)
#0
The dropout layer runs in training mode even if model's training
is False.
Finally, without setting manually, does the default value of training
of the layer influence the model's behavior? Let's change dropout's default training
to True:
class Dropout2(Dropout):
def call(self, inputs, training=True):
return super().call(inputs, training=training)
dropout = Dropout2(0.99999)
dropout(x)
#0
outputs = dropout(inputs)
model = tf.keras.Model(inputs, outputs)
model(x, training=False)
#1
We see that model's training
overrides layer's training
.
So just as the official document said, the priority order is: layer's training
manually set by functional API > outer layer's training
e.g. model's training
> layer's default training
value.
Upvotes: 0
Reputation: 17239
training
is a boolean argument that determines whether this call
function runs in training mode or inference mode. For example, the Dropout
layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.
y = Dropout(0.5)(x, training=True)
By this, we're setting training=True
for the Dropout
layer for training time. When we call .fit()
, it set sets a flag to True
and when we use evaluate
or predict
, in behind it sets a flag to False
. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape
scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training
argument is set as True
or False
if we want layers to operate either training
mode or inference
mode respectively.
# training mode
with tf.GradientTape() as tape:
logits = model(x, training=True) # forward pass
# inference mode
al_logits = model(x, training=False)
Now coming to your question. After defining the model
# Freeze the base_model
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
Now if your run this new model whether .fit()
or custom training loop, the base_model
will always run in inference mode as it's sets training=False
.
Upvotes: 3