lbf_1994
lbf_1994

Reputation: 259

Meaning of batch_size in model.evaluate()

I am building a plain vanilla FNN and want to evaluate my model after training. I was wondering what impact the batch_size has when evaluating the model on a test set. Of course it is relevant for training as it determines the number of samples to be fed to the network before computing the next gradient. It is also clear that it can be needed when predicting values for a (statefull) RNN. But it is not clear to me why it is needed when evaluating the model especially a FNN. Furthermore, I get slightly different values when I evaluate the model on the same test set but with different batch sizes. Consider the following toy example:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

# function to be learned
def f(x):
    return x[0] + x[1] + x[2]

# sample training and test points on a rectangular grid
x_train = np.random.uniform(low = -10, high = 10, size = (50,3))
y_train = np.apply_along_axis(f, 1, x_train).reshape(-1,1)

x_test = np.random.uniform(low = -10, high = 10, size = (50,3))
y_test = np.apply_along_axis(f, 1, x_test).reshape(-1,1)

model = Sequential()
model.add(Dense(20, input_dim = 3, activation = 'tanh'))
model.add(Dense(1))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse',
      optimizer=sgd)
model.fit(x_train, y_train, batch_size = 10, epochs = 30, verbose = 0)

model.evaluate(x_test, y_test, batch_size = 10)
model.evaluate(x_test, y_test, batch_size = 20)
model.evaluate(x_test, y_test, batch_size = 30)
model.evaluate(x_test, y_test, batch_size = 40)
model.evaluate(x_test, y_test, batch_size = 50)

The values are very similar but nevertheless different. Where does this come from? Shouldn't the following be always true?

from sklear.metrics import mean_squared_error as mse
0 == model.evaluate(x_test, y_test) - mse(model.predict(x_test), y_test)

Upvotes: 14

Views: 11431

Answers (2)

Dr. Snoopy
Dr. Snoopy

Reputation: 56347

No, they don't have to be the same. If you combine floating point math with parallelism, you don't get reproducible results as then (a + b) + c is not the same as a + (b + c), when a, b, and c, are floating point numbers.

The evaluate function of Model has a batch size just in order to speed-up evaluation, as the network can process multiple samples at a time, and with a GPU this makes evaluation much faster. I think the only way to reduce the effect of this would be to set batch_size to one.

Upvotes: 6

ykaner
ykaner

Reputation: 1840

The evaluation values differ simply because float values lack of precision.

The reason for using batch size in evaluate is the same as using it in training mode. And the reason is not as you said:

it is relevant for training as it determines the number of samples to be fed to the network before computing the next gradient

Just think about it, why can't you feed all the dataset without batches? Because you have not enough memory in your RAM to store all of it. And this is also the reason when evaluating.

Upvotes: 1

Related Questions