Philipp Paier
Philipp Paier

Reputation: 33

Better understanding of training parameter for Keras-Model call method needed

I'd like to get a better understanding of the parameter training, when calling a Keras model.

In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):

pred = model(x, training=True)

and when you want to do inference, you should set training to false:

pred = model(x, training=False)

What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False (because I plan on freezing it like in this tutorial here):

inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)

What will in such a case happen, when I later on call new_model(x_new, training=True)? Will the usage of training=False for the base_model be overruled? Or will training now allways be set to True for the base_model, regardless of what I pass to the new_model? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True), that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)?

Thanks in advance!

Upvotes: 3

Views: 2027

Answers (2)

lovetl2002
lovetl2002

Reputation: 1066

Actually the priority order of training is documented in keras:

# Training mode for `Layer.call` is set via (in order of priority):
# (1) The `training` argument passed to this `Layer.call`, if it is not None
# (2) The training mode of an outer `Layer.call`.
# (3) The default mode set by `tf.keras.backend.set_learning_phase` (if set)
# (4) Any non-None default value for `training` specified in the call
#  signature
# (5) False (treating the layer as if it's in inference)

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/base_layer.py

And the scenario you described i.e. setting training by functional API belongs to case (1). Just in case, we can do a simple test.

import tensorflow as tf
from tensorflow.keras.layers import Dropout

dropout = Dropout(0.99999)
x = [[1.0]]

dropout(x, training=True)
#0
dropout(x)
#1

The default training of Dropout layer is None. Obviously, layer dropout cannot remember its last training argument.

What about putting it in a model?

inputs = tf.keras.Input((1,))
outputs = dropout(inputs)
model = tf.keras.Model(inputs, outputs)
model(x, training=True)
#0
model(x, training=False)
#1

Without setting training manually, the behavior of dropout follows the training of the model. Try setting training when building the model:

outputs = dropout(inputs, training=True)
model = tf.keras.Model(inputs, outputs)
model(x, training=False)
#0

The dropout layer runs in training mode even if model's training is False. Finally, without setting manually, does the default value of training of the layer influence the model's behavior? Let's change dropout's default training to True:

class Dropout2(Dropout):

  def call(self, inputs, training=True):
    return super().call(inputs, training=training)

dropout = Dropout2(0.99999)
dropout(x)
#0

outputs = dropout(inputs)
model = tf.keras.Model(inputs, outputs)

model(x, training=False)
#1

We see that model's training overrides layer's training.

So just as the official document said, the priority order is: layer's training manually set by functional API > outer layer's training e.g. model's training > layer's default training value.

Upvotes: 0

Innat
Innat

Reputation: 17239

training is a boolean argument that determines whether this call function runs in training mode or inference mode. For example, the Dropout layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.

y = Dropout(0.5)(x, training=True) 

By this, we're setting training=True for the Dropout layer for training time. When we call .fit(), it set sets a flag to True and when we use evaluate or predict, in behind it sets a flag to False. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training argument is set as True or False if we want layers to operate either training mode or inference mode respectively.

# training mode 
with tf.GradientTape() as tape:
   logits = model(x, training=True) # forward pass

# inference mode 
al_logits = model(x, training=False) 

Now coming to your question. After defining the model

# Freeze the base_model
base_model.trainable = False

inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)

new_model = keras.Model(inputs, outputs)

Now if your run this new model whether .fit() or custom training loop, the base_model will always run in inference mode as it's sets training=False.

Upvotes: 3

Related Questions