Reputation: 1339
My question is simple, what is the validation data passed to model.fit in a Sequential model used for?
And, does it affect how the model is trained (normally a validation set is used, for example, to choose hyper-parameters in a model, but I think this does not happen here)?
I am talking about the validation set that can be passed like this:
# Create model
model = Sequential()
# Add layers
model.add(...)
# Train model (use 10% of training set as validation set)
history = model.fit(X_train, Y_train, validation_split=0.1)
# Train model (use validation data as validation set)
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test))
I investigated a bit, and I saw that keras.models.Sequential.fit
calls keras.models.training.fit
, which creates variables like val_acc
and val_loss
(which can be accessed from Callbacks). keras.models.training.fit
also calls keras.models.training._fit_loop
, which adds the validation data to the callbacks.validation_data
, and also calls keras.models.training._test_loop
, which will loop the validation data in batches on the self.test_function
of the model. The result of this function is used to fill the values of the logs, which are the values accessible from the callbacks.
After seeing all this, I feel that the validation set passed to model.fit
is not used to validate anything during training, and its only use is to get feedback on how the trained model will perform in every epoch for a completely independent set. Therefore, it would be okey to use the same validation and test set, right?
Could anyone confirm if the validation set in model.fit has any other goal besides being read from the callbacks?
Upvotes: 102
Views: 120422
Reputation: 10184
If you want to build a solid model you have to follow that specific protocol of splitting your data into three sets: One for training, one for validation and one for final evaluation, which is the test set.
The idea is that you train on your training data and tune your model with the results of metrics (accuracy, loss etc) that you get from your validation set.
Your model doesn't "see" your validation set and isn't in any way trained on it, but you as the architect and master of the hyperparameters tune the model according to this data. Therefore it indirectly influences your model because it directly influences your design decisions. You nudge your model to work well with the validation data and that can possibly bring in a tilt.
Exactly that is the reason you only evaluate your model's final score on data that neither your model nor you yourself has used – and that is the third chunk of data, your test set.
Only this procedure makes sure you get an unaffected view of your models quality and ability to generalize what is has learned on totally unseen data.
Upvotes: 110
Reputation: 51
So Basically in the validation set, the model will try to predict but it won't update its weights (which means that it won't learn from them) so you will get a clear idea of how well your model can find patterns in the training data and apply it to new data.
Upvotes: 5
Reputation: 2670
I think an overall discussion on train-set, validation-set and test-set will help:
Summarizing:
Again some practical issues here:
Upvotes: 15
Reputation: 3033
This YouTube video explains what a validation set is, why it's helpful, and how to implement a validation set in Keras: Create a validation set in Keras
With a validation set, you're essentially taking a fraction of your samples out of your training set, or creating an entirely new set all together, and holding out the samples in this set from training.
During each epoch, the model will be trained on samples in the training set but will NOT be trained on samples in the validation set. Instead, the model will only be validating on each sample in the validation set.
The purpose of doing this is for you to be able to judge how well your model can generalize. Meaning, how well is your model able to predict on data that it's not seen while being trained.
Having a validation set also provides great insight into whether your model is overfitting or not. This can be interpreted by comparing the acc
and loss
from your training samples to the val_acc
and val_loss
from your validation samples. For example, if your acc
is high, but your val_acc
is lagging way behind, this is good indication that your model is overfitting.
Upvotes: 33