Use same data for test and validation

Hey I am training a CNN model , and was wondering what will happen if I use the same data for validation and test? Does the model train on validation data as well? (Does my model see the validation data?) Or just the error and accuracy are calculatd and taken into account for training?

Upvotes: 2

Answers (4)

Daniel Malachov

Reputation: 1842

I wanted to find answer to same question and I've just found a lot of noise. I will try to answer how I personally see it after research.

You can use same data subset as validation and test dataset if you ensure, that during model selection and hyperparameters tuning you will not expose the validation dataset to the model fitting. I don't know whether this is happening in packages like Optuna or Keras hyperband tuner etc.. In that case I don't know how this could lead to overfitting.

I did some ML pipelines by myself and I had full control on how model selection and hyperparemeter tuning is done, so I knew, that I use validation dataset really only to calculate some metrics and never to fit the models. I don't see any reason to have different validation set than test set. Especially when I didn't have much data and having for example 10 % to validation set and 10 % for test set made the model worse performance.

Upvotes: 0

jeremy_rutman

Reputation: 5720

If you use the same set for validation and test, your model may overfit (since it has seen the test data before the final test stage).

Upvotes: 2

pouyan

Reputation: 3439

Take a look at this article for more information which here I point out the most relevant parts of it to your question :

A validation dataset is a sample of data held back from training your model that is used to give an estimate of model skill while tuning model’s hyperparameters.

The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models.

Upvotes: 1

theletz

Reputation: 1805

You use your validation_set to tune your model. It means that you don`t train on this data but the model takes it into account. For example, you use it to tune the model's hyperparameters.

In order to have a good evaluation - as test set you should use a data which is totally unknown to this model.

Upvotes: 2

Use same data for test and validation

Answers (4)

Related Questions