Shuai
Shuai

Reputation: 1153

What does `training=True` mean when calling a TensorFlow Keras model?

In TensorFlow's offcial documentations, they always pass training=True when calling a Keras model in a training loop, for example, logits = mnist_model(images, training=True).

I tried help(tf.keras.Model.call) and it shows that

Help on function call in module tensorflow.python.keras.engine.network:

call(self, inputs, training=None, mask=None)
    Calls the model on new inputs.

    In this case `call` just reapplies
    all ops in the graph to the new inputs
    (e.g. build a new computational graph from the provided inputs).

    Arguments:
        inputs: A tensor or list of tensors.
        training: Boolean or boolean scalar tensor, indicating whether to run
          the `Network` in training mode or inference mode.
        mask: A mask or list of masks. A mask can be
            either a tensor or None (no mask).

    Returns:
        A tensor if there is a single output, or
        a list of tensors if there are more than one outputs.

It says that training is a Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode. But I didn't find any information about this two modes.

In a nutshell, I don't know what is the influence of this argument. And what if I missed this argument when training?

Upvotes: 23

Views: 21786

Answers (2)

NahidEbrahimian
NahidEbrahimian

Reputation: 61

Training indicating whether the layer should behave in training mode or in inference mode.

  • training=True: The layer will normalize its inputs using the mean and variance of the current batch of inputs.

  • training=False: The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.

Usually in inference mode training=False, but in some networks such as pix2pix_cGAN‍‍‍‍‍‍ At both times of inference and training, training=True.

Upvotes: 4

xdurch0
xdurch0

Reputation: 10474

Some neural network layers behave differently during training and inference, for example Dropout and BatchNormalization layers. For example

  • During training, dropout will randomly drop out units and correspondingly scale up activations of the remaining units.
  • During inference, it does nothing (since you usually don't want the randomness of dropping out units here).

The training argument lets the layer know which of the two "paths" it should take. If you set this incorrectly, your network might not behave as expected.

Upvotes: 34

Related Questions