Reputation: 101
I have defined a custom RMSE function:
def rmse(y_pred, y_true):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
I was evaluating it against the mean squared error provided by Keras:
keras.losses.mean_squared_error(y_true, y_pred)
The values I get for MSE and RMSE metrics respectively for some (the same) prediction are:
mse: 115.7218 - rmse: 8.0966
Now, when I take the root of the MSE, I get 10.7574
, which is obviously higher than the RMSE the custom RMSE function outputs. I haven't been able to figure out why this is so, nor have I found any related posts on this particular topic. Is there maybe a mistake in the RMSE function that I'm simply not seeing? Or is it somehow related to how Keras defines axis=-1
in the MSE function (purpose of which I haven't fully understood yet)?
Here is where I invoke the RMSE and MSE:
model.compile(loss="mae", optimizer="adam", metrics=["mse", rmse])
So I would expect the root of MSE to be the same as the RMSE.
I originally asked this question on Cross Validated but it was put on hold as off-topic.
Upvotes: 9
Views: 2137
Reputation: 335
Although sqrt(mse)
is equal to rmse
for a simple model configuration as Manoj's answer has shown, I faced this problem for a complex model configuration and was unable to figure out why it happened. However, I have found a workaround to get rid of that if someone badly needs to monitor rmse
as a metric but facing the same problem in the question. I used the LambdaCallback
in the callbacks
to print the rmse
of training and validation after every epoch and it worked:
def rmse(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_true - y_pred)))
rmse_print_callback = keras.callbacks.LambdaCallback(on_epoch_end=lambda epoch,logs:
print(f"rmse: {rmse(training_labels, model.predict(training_data)):.4f} - val_rmse: {rmse(validation_labels, model.predict(validation_data)):.4f}"))
model.fit(training_data, training_labels, epochs= 100, callbacks=[rmse_print_callback])
Upvotes: 0
Reputation: 6044
Is there maybe a mistake in the RMSE loss function that I'm simply not seeing? Or is it somehow related to how Keras defines axis=-1 in the MSE loss function (purpose of which I haven't fully understood yet)?
When Keras does the loss calculation, the batch dimension is retained which is the reason for axis=-1
. The returned value is a tensor. This is because the loss for each sample may have to be weighted before taking the mean depending on whether certain arguments are passed in the fit()
method like sample_weight
.
I get the same results with both the approaches.
from tensorflow import keras
import numpy as np
from keras import backend as K
def rmse(y_pred, y_true):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
l1 = keras.layers.Input(shape=(32))
l2 = keras.layers.Dense(10)(l1)
model = keras.Model(inputs=l1, outputs=l2)
train_examples = np.random.randn(5,32)
train_labels=np.random.randn(5,10)
MSE approach
model.compile(loss='mse', optimizer='adam')
model.evaluate(train_examples, train_labels)
RMSE approach
model.compile(loss=rmse, optimizer='adam')
model.evaluate(train_examples, train_labels)
Output
5/5 [==============================] - 0s 8ms/sample - loss: 1.9011
5/5 [==============================] - 0s 2ms/sample - loss: 1.3788
sqrt(1.9011) = 1.3788
Upvotes: 6