What is the definition of metrics' values contained in keras history objects?

Question

The documentation states that model.fit would return a History object which contains various metrics evaluated during the training. These metrics are also printed to the stdout during training (see this question for example).

The documentation states that the history object is

a record of training loss values and metrics values at successive epochs, [...]

Now I would like to know whether these metrics are given as an average per sample or as an average per batch? Suppose I have model.fit(x, y, batch_size=16, ...). Are the metrics given as accumulated within and averaged over batches (i.e. a value would correspond to the combined values for the 16 samples in a batch)? Or are they given per sample (i.e. averaged over the whole data set)?

Edit

Apparently metrics are computed not per sample but per output. This is loosely indicated by the documentation of model.fit; namely it states that if one specifies a different loss for each output node then the summed loss would be minimized. This indicates two things: Firstly the loss (metrics) are not computed per sample but per output instead (averaged within and over batches though). If the loss (metrics) for each output were averaged over the various outputs then this procedure would be similar to a per-sample computation. However, secondly, the documentation indicates that losses for different outputs are summed not averaged. So this requires a bit more investigation.

Diving into the source code reveals that indeed loss functions are stored per output. In case we don't specify any weights for the various outputs manually a weight of one will be assigned by default. Then the relevant loss computation part starts here. Losses are summed and no average seems to be taken. Well, we should see this from a quick experiment:

from keras.initializers import Ones, Zeros
from keras.models import Sequential
from keras.layers import Dense
import numpy as np

x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)

model = Sequential()
model.add(Dense(2, input_dim=2, kernel_initializer=Ones(), bias_initializer=Zeros(), trainable=False))
model.compile('sgd', loss='mean_absolute_error', metrics=['mean_absolute_error', 'mean_squared_error'])

# Metrics per sample and output.
ae = np.abs(np.sum(x, axis=1)[:, None] - y)  # Absolute error.
se = (np.sum(x, axis=1)[:, None] - y)**2  # Squared error.
print('Expected metrics for averaging over samples but summing over outputs:')
print(f'	MAE: {np.sum(np.mean(ae, axis=0))}, MSE: {np.sum(np.mean(se, axis=0))}', end='

')
print('Expected metrics for averaging over samples and averaging over outputs:')
print(f'	MAE: {np.mean(np.mean(ae, axis=0))}, MSE: {np.mean(np.mean(se, axis=0))}')

for batch_size in [1, 2, 4, 8]:
    print(f'
# Batch size: {batch_size}')
    model.fit(x, y, batch_size=batch_size, epochs=1, shuffle=False)

Which produces the following output:

Expected metrics for averaging over samples but summing over outputs:
    MAE: 30.0, MSE: 618.0

Expected metrics for averaging over samples and averaging over outputs:
    MAE: 15.0, MSE: 309.0

# Batch size: 1
Epoch 1/1
8/8 [==============================] - 0s 4ms/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 2
Epoch 1/1
8/8 [==============================] - 0s 252us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 4
Epoch 1/1
8/8 [==============================] - 0s 117us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

# Batch size: 8
Epoch 1/1
8/8 [==============================] - 0s 60us/step - loss: 15.0000 - mean_absolute_error: 15.0000 - mean_squared_error: 309.0000

Curiously the reported metric' values seem to be averaged over the outputs while the documentation as well as the source code indicate they would be summed. I would be glad if someone could clarify what's going on here.

Kota Mori · Accepted Answer

To simplify the problem, let's define a "model" that returns the input as is.

from keras.layers import Input
from keras.models import Model

inp = Input((2,))
model = Model(inputs=inp, outputs=inp)
model.summary()

#__________________________________________________________________
Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 2)                 0         
#=================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0
#__________________________________________________________________

Although there is no parameter to train, let's train the model to see how keras computes the metrics.

import numpy as np
x = np.arange(16).reshape(8, 2).astype(float)
y = np.zeros((8, 2), dtype=float)

model.compile(optimizer="adam", loss="mse", metrics=["mae"])

for bs in [1, 2, 3, 8]:
    print("Training with batch size", bs)
    model.fit(x, y, epochs=1, batch_size=bs)
    print("")

I get:

Training with batch size 1
Epoch 1/1
8/8 [=============] - 0s 10ms/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 2
Epoch 1/1
8/8 [=============] - 0s 1ms/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 3
Epoch 1/1
8/8 [=============] - 0s 806us/step - loss: 77.5000 - mean_absolute_error: 7.5000

Training with batch size 8
Epoch 1/1
8/8 [=============] - 0s 154us/step - loss: 77.5000 - mean_absolute_error: 7.5000

So, the MSE (loss) = 77.5 and MAE = 7.5, regardless of the batch size.

To replicate the result, we can:

np.mean((x - y) ** 2)
# 77.5
np.mean(np.abs(x - y))
# 7.5

Now, as to the "weighted sum" statement in the keras document, this is about a list of outputs, not about multi-column outputs.

from keras.layers import Input, Lambda
from keras.models import Model

inp = Input((2,))
y1 = Lambda(lambda x: x[:, 0:1], name="Y1")(inp)
y2 = Lambda(lambda x: x[:, 1:2], name="Y2")(inp)
model = Model(inputs=inp, outputs=[y1, y2])
model.summary()

#_____________________________________________________________________
#Layer (type)          Output Shape         Param #     Connected to                     
#=====================================================================
#input_6 (InputLayer)  (None, 2)            0                                            
#_____________________________________________________________________
#Y1 (Lambda)           (None, 1)            0           input_6[0][0]                    
#_____________________________________________________________________
#Y2 (Lambda)           (None, 1)            0           input_6[0][0]                    
#=====================================================================
#Total params: 0
#Trainable params: 0
#Non-trainable params: 0

This model is exactly same as above, except that the output is split into two.

Training outcome is as below.

model.compile(optimizer="adam", loss="mse", metrics=["mae"])
for bs in [1, 2, 3, 8]:
    print("Training with batch size", bs)
    model.fit(x, [y[:, 0:1], y[:, 1:2]], epochs=1, batch_size=bs)
    print("")

#Training with batch size 1
#Epoch 1/1
#8/8 [==============================] - 0s 15ms/step - loss: 155.0000 -
#Y1_loss: 70.0000 - Y2_loss: 85.0000 - Y1_mean_absolute_error: 7.0000 -
#Y2_mean_absolute_error: 8.0000
# 
#same for all batch sizes

Keras now computes the loss for each output separately, then take sum of them. We can replicate the result by

np.mean(np.sum((x - y) ** 2, axis=-1))
# 155.0
np.mean(np.sum(np.abs(x - y), axis=-1))
# 15.0 (= 7.0 + 8.0)

What is the definition of metrics' values contained in keras history objects?

Edit

Answers (1)

Related Questions

What is the definition of metrics&#39; values contained in keras history objects?

Edit

Answers (1)

Related Questions

What is the definition of metrics' values contained in keras history objects?