Jimmy2027
Jimmy2027

Reputation: 333

How to reset Keras metrics?

To do some parameter tuning, I like to loop over some training function with Keras. However, I realized that when using tensorflow.keras.metrics.AUC() as a metric, for every training loop, an integer gets added to the auc metric name (e.g. auc_1, auc_2, ...). So actually the keras metrics are somehow stored even when coming out of the training function.

This first of all leads to the callbacks not recognizing the metric anymore and also makes me wonder if there are not other things stored like the model weights.

How can I reset the metrics and are there other things that get stored by keras that I need to reset to get a clean restart for training?

Below you can find a minimal working example:

edit: this example seems to only work with tensorflow 2.2

import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC


def dummy_network(input_shape):
    model = keras.Sequential()
    model.add(keras.layers.Dense(10,
                                 input_shape=input_shape,
                                 activation=tf.nn.relu,
                                 kernel_initializer='he_normal',
                                 kernel_regularizer=keras.regularizers.l2(l=1e-3)))

    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(11, activation='sigmoid'))

    model.compile(optimizer='adagrad',
                  loss='binary_crossentropy',
                  metrics=[AUC()])
    return model


def train():
    CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_auc",
        patience=3,
        verbose=1,
        mode="max",
        min_delta=0.0001,
        min_lr=1e-6)

    CB_es = tf.keras.callbacks.EarlyStopping(
        monitor="val_auc",
        min_delta=0.00001,
        verbose=1,
        patience=10,
        mode="max",
        restore_best_weights=True)
    callbacks = [CB_lr, CB_es]
    y = [np.ones((11, 1)) for _ in range(1000)]
    x = [np.ones((37, 12, 1)) for _ in range(1000)]
    dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
    val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
    model = dummy_network(input_shape=((37, 12, 1)))
    model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
              steps_per_epoch=len(x) // 100,
              validation_steps=len(x) // 100, callbacks=callbacks)


for i in range(3):
    print(f'\n\n **** Loop {i} **** \n\n')
    train()

The output is:

 **** Loop 0 **** 


2020-06-16 14:37:46.621264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f991e541f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-16 14:37:46.621296: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch 1/2
10/10 [==============================] - 0s 44ms/step - loss: 0.1295 - auc: 0.0000e+00 - val_loss: 0.0310 - val_auc: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - 0s 10ms/step - loss: 0.0262 - auc: 0.0000e+00 - val_loss: 0.0223 - val_auc: 0.0000e+00 - lr: 0.0010


 **** Loop 1 **** 


Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.4751 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 36ms/step - loss: 0.4751 - auc_1: 0.0000e+00 - val_loss: 0.3137 - val_auc_1: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.2617 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.2617 - auc_1: 0.0000e+00 - val_loss: 0.2137 - val_auc_1: 0.0000e+00 - lr: 0.0010


 **** Loop 2 **** 


Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.1948 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 34ms/step - loss: 0.1948 - auc_2: 0.0000e+00 - val_loss: 0.0517 - val_auc_2: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.0445 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.0445 - auc_2: 0.0000e+00 - val_loss: 0.0389 - val_auc_2: 0.0000e+00 - lr: 0.0010

Upvotes: 8

Views: 4624

Answers (1)

Nicolas Gervais
Nicolas Gervais

Reputation: 36674

Your reproducible example failed in several places for me, so I changed just a few things (I'm using TF 2.1). After getting it to run, I was able to get rid of the additional metric names by specifying metrics=[AUC(name='auc')]. Here's the full (fixed) reproducible example:

import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC


def dummy_network(input_shape):
    model = keras.Sequential()
    model.add(keras.layers.Dense(10,
                                 input_shape=input_shape,
                                 activation=tf.nn.relu,
                                 kernel_initializer='he_normal',
                                 kernel_regularizer=keras.regularizers.l2(l=1e-3)))

    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(11, activation='softmax'))

    model.compile(optimizer='adagrad',
                  loss='binary_crossentropy',
                  metrics=[AUC(name='auc')])
    return model


def train():
    CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_auc",
        patience=3,
        verbose=1,
        mode="max",
        min_delta=0.0001,
        min_lr=1e-6)

    CB_es = tf.keras.callbacks.EarlyStopping(
        monitor="val_auc",
        min_delta=0.00001,
        verbose=1,
        patience=10,
        mode="max",
        restore_best_weights=True)
    callbacks = [CB_lr, CB_es]
    y = tf.keras.utils.to_categorical([np.random.randint(0, 11) for _ in range(1000)])
    x = [np.ones((37, 12, 1)) for _ in range(1000)]
    dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
    val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
    model = dummy_network(input_shape=((37, 12, 1)))
    model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
              steps_per_epoch=len(x) // 100,
              validation_steps=len(x) // 100, callbacks=callbacks)


for i in range(3):
    print(f'\n\n **** Loop {i} **** \n\n')
    train()
Train for 10 steps, validate for 10 steps
Epoch 1/2
 1/10 [==>...........................] - ETA: 6s - loss: 0.3426 - auc: 0.4530
 7/10 [====================>.........] - ETA: 0s - loss: 0.3318 - auc: 0.4895
10/10 [==============================] - 1s 117ms/step - loss: 0.3301 - 
                                         auc: 0.4893 - val_loss: 0.3222 - val_auc: 0.5085

This happens because every loop, you created a new metric without a specified name by doing this: metrics=[AUC()]. The first iteration of the loop, TF automatically created a variable in the name space called auc, but at the second iteration of your loop, the name 'auc' was already taken, so TF named it auc_1 since you didn't specify a name. But, your callback was set to be based on auc, which is a metric that this model didn't have (it was the metric of the model from the previous loop). So, you either do name='auc' and overwrite the previous metric name, or define it outside of the loop, like this:

import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC

auc = AUC()

def dummy_network(input_shape):
    model = keras.Sequential()
    model.add(keras.layers.Dense(10,
                                 input_shape=input_shape,
                                 activation=tf.nn.relu,
                                 kernel_initializer='he_normal',
                                 kernel_regularizer=keras.regularizers.l2(l=1e-3)))

    model.add(keras.layers.Flatten())
    model.add(keras.layers.Dense(11, activation='softmax'))
    model.compile(optimizer='adagrad',
                  loss='binary_crossentropy',
                  metrics=[auc])
    return model

And don't worry about keras resetting the metrics. It takes care of all that in the fit() method. If you want more flexibility and/or do it yourself, I suggest using custom training loops, and reset it yourself:

auc = tf.keras.metrics.AUC()

auc.update_state(np.random.randint(0, 2, 10), np.random.randint(0, 2, 10)) 

print(auc.result())

auc.reset_states()

print(auc.result())
Out[6]: <tf.Tensor: shape=(), dtype=float32, numpy=0.875>  # state updated
Out[8]: <tf.Tensor: shape=(), dtype=float32, numpy=0.0>  # state reset

Upvotes: 6

Related Questions