intl
intl

Reputation: 2773

Vowpal Wabbit training and testing data formats

I'm trying Vowpal Wabbit and am in the process of figuring out the file formats required for training and testing. I've been following the tutorial from https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial and see that the following is the training data format:

0 | price:.23 sqft:.25 age:.05 2006
1 2 'second_house | price:.18 sqft:.15 age:.35 1976
0 1 0.5 'third_house | price:.53 sqft:.32 age:.87 1924

For the testing data, I don't have the labels or any outputs, but just the features. How would I go about writing that out? I've tried just including the features like so:

price:.23 sqft:.25 age:.05 2006
price:.18 sqft:.15 age:.35 1976
price:.53 sqft:.32 age:.87 1924

But, that gives me exceptions as it's not the proper format. I have also tried the following and all give me just 0's as results:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

0 0 0 | price:.23 sqft:.25 age:.05 2006
0 0 0 | price:.18 sqft:.15 age:.35 1976
0 0 0 | price:.53 sqft:.32 age:.87 1924

Anyone the format I should be aiming for, knowing only the features? Thanks for the help.

Upvotes: 5

Views: 2918

Answers (1)

Martin Popel
Martin Popel

Reputation: 2670

The bar symbol (|) must be also in the format for predictions:

| price:.23 sqft:.25 age:.05 2006
| price:.18 sqft:.15 age:.35 1976
| price:.53 sqft:.32 age:.87 1924

If you don't include the correct labels, vw cannot compute the test loss, of course. To get the predictions use vw -d test_set.vw -t -p predictions.txt. The training set in the tutorial (with three examples only) is too small to train any reasonable model.

Upvotes: 6

Related Questions