mka
mka

Reputation: 101

Custom RMSE not the same as taking the root of built-in Keras MSE for same prediction

I have defined a custom RMSE function:

def rmse(y_pred, y_true):
    return K.sqrt(K.mean(K.square(y_pred - y_true)))

I was evaluating it against the mean squared error provided by Keras:

keras.losses.mean_squared_error(y_true, y_pred)

The values I get for MSE and RMSE metrics respectively for some (the same) prediction are:

mse: 115.7218 - rmse: 8.0966

Now, when I take the root of the MSE, I get 10.7574, which is obviously higher than the RMSE the custom RMSE function outputs. I haven't been able to figure out why this is so, nor have I found any related posts on this particular topic. Is there maybe a mistake in the RMSE function that I'm simply not seeing? Or is it somehow related to how Keras defines axis=-1 in the MSE function (purpose of which I haven't fully understood yet)?

Here is where I invoke the RMSE and MSE:

model.compile(loss="mae", optimizer="adam", metrics=["mse", rmse])

So I would expect the root of MSE to be the same as the RMSE.

I originally asked this question on Cross Validated but it was put on hold as off-topic.

Upvotes: 9

Views: 2137

Answers (2)

Md Hishamur Rahman
Md Hishamur Rahman

Reputation: 335

Although sqrt(mse) is equal to rmse for a simple model configuration as Manoj's answer has shown, I faced this problem for a complex model configuration and was unable to figure out why it happened. However, I have found a workaround to get rid of that if someone badly needs to monitor rmse as a metric but facing the same problem in the question. I used the LambdaCallback in the callbacks to print the rmse of training and validation after every epoch and it worked:

def rmse(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_true - y_pred)))

rmse_print_callback = keras.callbacks.LambdaCallback(on_epoch_end=lambda epoch,logs: 
                 print(f"rmse: {rmse(training_labels, model.predict(training_data)):.4f} - val_rmse: {rmse(validation_labels, model.predict(validation_data)):.4f}"))

model.fit(training_data, training_labels, epochs= 100, callbacks=[rmse_print_callback])

Upvotes: 0

Manoj Mohan
Manoj Mohan

Reputation: 6044

Is there maybe a mistake in the RMSE loss function that I'm simply not seeing? Or is it somehow related to how Keras defines axis=-1 in the MSE loss function (purpose of which I haven't fully understood yet)?

When Keras does the loss calculation, the batch dimension is retained which is the reason for axis=-1. The returned value is a tensor. This is because the loss for each sample may have to be weighted before taking the mean depending on whether certain arguments are passed in the fit() method like sample_weight.

I get the same results with both the approaches.

from tensorflow import keras
import numpy as np
from keras import backend as K

def rmse(y_pred, y_true):
    return K.sqrt(K.mean(K.square(y_pred - y_true)))

l1 = keras.layers.Input(shape=(32))
l2 = keras.layers.Dense(10)(l1)
model = keras.Model(inputs=l1, outputs=l2)

train_examples = np.random.randn(5,32)
train_labels=np.random.randn(5,10)

MSE approach

model.compile(loss='mse', optimizer='adam')
model.evaluate(train_examples, train_labels)

RMSE approach

model.compile(loss=rmse, optimizer='adam')
model.evaluate(train_examples, train_labels)

Output

5/5 [==============================] - 0s 8ms/sample - loss: 1.9011
5/5 [==============================] - 0s 2ms/sample - loss: 1.3788

sqrt(1.9011) = 1.3788

Upvotes: 6

Related Questions