Reputation: 1405
I'm trying to implement grid search or more sophisticated hyperparameter search in Vowpal Wabbit. Is there a relatively simple way to get a loss function value obtained on a validation set (holdout in vw) for this purpose? VW must have computed it e.g. for every number of passes, because early stopping happens depending on it's value.
As yet, I detour this by creating a separate file with validation dataset, saving different models' predictions on this dataset, and comparing their performance in python, thereby incurring unnecessary waste of data. But maybe there is a way to use vw holdout scores explicitly?
Upvotes: 2
Views: 1185
Reputation: 2670
To summarize the comments, there are several ways how to get holdout loss from VW (they can be combined):
--holdout_off
is specified) based on each 10th example (not on random 1/10 of examples). Using --holdout_period
one can specify different number than 10.--holdout_after=N
specifies that first N examples of the input data will be used for training and the rest of the file as holdout set (instead each 10th example).-p predictions.txt
and compute the loss outside of VW (by comparing predictions.txt
with the input data with gold labels). When X passes are used, predictions.txt
will contain X*number_of_input_data_examples. Thus, it is recommended to train on the training data (possibly with multiple passes), save the model to a file and then use VW only to predict: vw -i trained.model -t -d test.input -p test.predictions
.--save_per_pass
or vw --daemon
and saving model on demand may be helpful.Upvotes: 2