Andrei
Andrei

Reputation: 1373

vowpal-wabbit: use of multiple passes, holdout, & holdout-period to avoid overfitting?

I would like to train the binary sigmoidal feedforward network for category classification with following command using awesome vowpal wabbit tool:

vw --binary --nn 4 train.vw -f category.model

And test it:

vw --binary -t -i category.model -p test.vw

But I had very bad results (comparing to my linear svm estimator).

I found a comment that I should use Number of Training Passes argument (--passes arg).

So my question is how to know the count of training passes in order not to get retrained model?

P.S. should I use holdout_period argument? and how?

Upvotes: 2

Views: 1412

Answers (1)

arielf
arielf

Reputation: 5952

The test command in the question is incorrect. It has no input (-p ... indicates output predictions). Also it is not clear if you want to test or predict because it says test but the command used has -p ...

Test means you have labeled-data and you're evaluating the quality of your model. Strictly speaking: predict means you don't have labels, so you can't actually know how good your predictions are. Practically, you may also predict on held-out, labeled data (pretending it has no labels by ignoring them) and then evaluate how good these predictions are, since you actually have labels.

Generally:

  • if you want to do binary-classification, you should use labels in {-1, 1} and use --loss_function logistic. --binary which is an independent option meaning you want predictions to be binary (giving you less info).

  • if you already have a separate test-set with labels, you don't need to holdout.

The holdout mechanism in vw was designed to replace the test-set and avoid over-fitting, it is only relevant when multiple passes are used because in a single pass all examples are effectively held-out; each next (yet unseen) example is treated as 1) unlabeled for prediction, and as 2) labeled for testing and model-update. IOW: your train-set is effectively also your test-set.

So you can either do multiple passes on the train-set with no holdout:

 vw --loss_function logistic --nn 4 -c --passes 2 --holdout_off train.vw -f model

and then test the model with a separate and labeled, test-set:

 vw -t -i model test.vw

or do multiple passes on the same train-set with some hold-out as a test set.

vw --loss_function logistic --nn 4 -c --passes 20 --holdout_period 7 train.vw -f model

If you don't have a test-set, and you want to fit-stronger by using multiple-passes, you can ask vw to hold-out every Nth example (the default N is 10, but you may override it explicitly using --holdout_period <N> as seen above). In this case, you can specify a higher number of passes because vw will automatically do early-termination when the loss on the held-out set starts growing.

You'd notice you hit early termination since vw will print something like:

passes used = 5
...
average loss = 0.06074 h

Indicating that only 5 out of N passes were actually used before early stopping, and the error on the held-out subset of example is 0.06074 (the trailing h indicates this is held-out loss).

As you can see, the number of passes, and the holdout-period are completely independent options.

To improve and get more confidence in your model, you could use other optimizations, vary the holdout_period, try other --nn args. You may also want to check the vw-hypersearch utility (in the utl subdirectory) to help find better hyper-parameters.

Here's an example of using vw-hypersearch on one of the test-sets included with the source:

$ vw-hypersearch 1 20 vw --loss_function logistic --nn % -c --passes 20 --holdout_period 11 test/train-sets/rcv1_small.dat --binary
trying 13 ............. 0.133333 (best)
trying 8 ............. 0.122222 (best)
trying 5 ............. 0.088889 (best)
trying 3 ............. 0.111111
trying 6 ............. 0.1
trying 4 ............. 0.088889 (best)
loss(4) == loss(5): 0.088889
5       0.08888

Indicating that either 4 or 5 should be good parameters for --nn yielding a loss of 0.08888 on a hold-out subset of 1 in 11 examples.

Upvotes: 4

Related Questions