Akim Tsvigun
Akim Tsvigun

Reputation: 91

Loss function in Keras does not match analogical function

I compared the results, obtained via model.evaluate(...) and the ones via numpy. As you can see, they differ a lot. The kernel has just been restarted. Cannot find where the problem is.

import numpy as np
import keras
from keras.layers import Dense
from keras.models import Sequential
import keras.backend as K

X = np.random.rand(10000)
Y = X + np.random.rand(10000) / 5

X_train, X_valid = X[:8000], X[8000:]
Y_train, Y_valid = Y[:8000], Y[8000:]

model = Sequential([
    Dense(1, input_shape=(1,), activation='linear'),
])
model.compile('adam', 'mae')
model.fit(X_train, Y_train, epochs=1, batch_size=2000, validation_data=(X_valid, Y_valid))

print(model.evaluate(X_valid, Y_valid))
>>> 0.15643194556236267

preds = model.predict(X_valid)
np.abs(Y_valid - preds).mean()
>>> 0.34461398701699736

Versions: keras = '2.3.1', tensorflow = '2.1.0'.

Upvotes: 1

Views: 114

Answers (2)

xdurch0
xdurch0

Reputation: 10474

This is a tricky one, but actually simple to fix:

Your targets Y_valid have shape (2000,), i.e. just an array of 2000 numbers. The network outputs however, have shape (2000, 1). The expression Y_valid - preds then tries to subtract a shape (2000, 1) from a shape (2000,)... The two are not compatible, and need to be broadcast. Standard broadcasting rules will proceed as follows:

1. Align like  
(      2000,) 
(2000, 1)`

2. add extra dimension in front
(1,    2000,)  
(2000, 1)

3. broadcast to make compatible
(2000, 2000)
(2000, 2000)

...and so you are actually subtracting two arrays of size (2000, 2000) from each other. You are basically computing the difference between each prediction and all targets instead of just the corresponding one. Obviously, the mean of this will be much larger.

tl; dr: model.evaluate is correct. The manual computation is incorrect due to funny broadcasting. You can fix it by reshaping the predictions to (2000,) (or the targets to (2000, 1):

preds = model.predict(X_valid)[:, 0]
np.abs(Y_valid - preds).mean()

Upvotes: 1

mcemilg
mcemilg

Reputation: 976

It's because the model.predict output shape is not same with Y_valid. If you get the transpose of the predictions it will give you almost same loss.

>>> Y_valid.shape                                                          
(2000,)
>>> preds.shape                                                            
(2000, 1)
>>> np.abs(Y_valid - np.transpose(preds)).mean()

Upvotes: 2

Related Questions