Long
Long

Reputation: 1793

Difference between doing cross-validation and validation_data/validation_split in Keras

First, I split the dataset into train and test, for example:

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.4, random_state=999)

I then use GridSearchCV with cross-validation to find the best performing model:

validator  = GridSearchCV(estimator=clf, param_grid=param_grid, scoring="accuracy", cv=cv)

And by doing this, I have:

A model is trained using k-1 of the folds as training data; the resulting model is validated on the remaining part of the data (scikit-learn.org)

But then, when reading about Keras fit fuction, the document introduces 2 more terms:

validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.

validation_data: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. validation_data will override validation_split.

From what I understand, validation_split (to be overridden by validation_data) will be used as an unchanged validation dataset, meanwhile hold-out set in cross-validation changes during each cross-validation step.

Upvotes: 4

Views: 2125

Answers (1)

today
today

Reputation: 33410

Validation is performed to ensure that the model is not overfitting on the dataset and it would generalize to new data. Since in the parameters grid search you are also doing validation then there is no need to perform the validation step by the Keras model itself during training. Therefore to answer your questions:

is it necessary to use validation_split or validation_data since I already do cross validation?

No, as I mentioned above.

if it is not necessary, then should I set validation_split and validation_data to 0 and None, respectively?

No, since by default no validation is done in Keras (i.e. by default we have validation_split=0.0, validation_data=None in fit() method).

If I do so, what will happen during the training, would Keras just simply ignore the validation step?

Yes, Keras won't perform the validation when training the model. However note that, as I mentioned above, the grid search procedure would perform validation to better estimate the performance of the model with a specific set of parameters.

Upvotes: 4

Related Questions