Reputation: 301
Can someone clear up this doubt of mine.
While evaluating the model, we should try a smaller set. the dev set is a small set. So we try something on the dev set and come to a conclusion and then go to the train set to train it properly and check.
OR
We train the training set and evaluate the model on dev set. with dev set as a benchmark.
Upvotes: 1
Views: 2596
Reputation: 1484
We train our model on the training set and evaluate the model on dev and test sets. In a sense, the purpose of the test set is to make sure that our evaluation of the dev set is correct (we expect both dev and test errors to have close values).
Dev, the test should have the same distribution as If they had different distributions we wouldn’t be able to compare errors and reason about results.
Upvotes: 0
Reputation: 109
Your first scenario is correct.
As training a deep neural network may takes enormous time, it is not a good way to first create the final model then (in the end), make an effort to evaluate it whether it is working well or not !!! But what's the solution?! Here is where the dev set
(also known as validation set) appears. It is much better to evaluate our model at the same time it is being trained.
To do this, we split our dataset into train data
and dev set
. Now we can compute two extra features about our model asides from "accuracy" and "loss" in each epoch: dev_accuracy
and dev_loss
which can be helpful to figure out our model issue. For instance, if accuracy
is indicating a high percentage over our training data (say 0.92) but dev_accuracy
is equal to 0.3, it obviously means that our model issue is overfitting! (Why?) Cause it is working very well on our train data but cannot have a good prediction about some data which are new (They haven't already been passed to model) to our model.
Upvotes: 1
Reputation: 66815
Normally you would have three sets:
For various reasons some of the above might be missing in the setup, but this is the standard approach, and every modification requires good reasons to do that.
Often datasets do not specify "validation", as fitting of hyperprameters etc. is considered part of the training, thus every data point used for that, is de facto used to train your model (thus a part of "train" dataset). In practise, this means that you have to split train set on your own, into "proper train" and "validation" (if method being used requires fitting some additional hyperparameters).
Upvotes: 3