aaron
aaron

Reputation: 6489

Possible to output Vowpal Wabbit predictions to .txt along with observed target values?

We're writing a forecasting application that uses Vowpal Wabbit and are looking to automate as much of our model validation process as we can. Anyone know whether vw has a native utility to output the target values in a test file along with the predictions from a vw model? These values are printed to the terminal output during prediction. Is there an argument to the regular vw call, or perhaps a tool in the utl folder that prints targets and forecasts together on a row-wise basis?

Here's what the code I'm using now for prediction looks like:

vw -d /path/to/data/test.vw -t -i lg.vw --link=logistic -p predictions.txt

My goal is to produce from within Vowpal an output file that looks like this:

Predicted  Target
0.78       1
0.23       0 
0.49       1

...

UPDATE

@arielf's code worked like a charm. I've only made one minor addition to print the streaming results to a validation.txt file:

vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
     perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)' > validation.txt

Upvotes: 1

Views: 1220

Answers (1)

arielf
arielf

Reputation: 5952

Try this:

vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
    perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)'

Explanation:

-P 1     # Add option: set vw progress report to apply to every example

Note: -P is a capital P (alias for --progress), 1 is the progress printing interval.

Note that you don't need to add predictions with -p ... since that is redundant in this case (predictions are already included in vw progress lines)

A progress report line with headers, looks like this:

average   since     example    example   current  current   current
loss      last      counter     weight     label  predict  features
0.000494  0.000494        1        1.0   -0.0222   0.0000        14

Since progress report goes to stderr, we need to redirect stderr to stdout (2>&1).

Now we pipe the vw progress output into perl for simple post-processing. The perl command loops over each line of input without printing by default (-n), auto-splits into fields on white-space (-a), and applies the expression (-e) printing the 5th and 4th fields separated by a TAB and terminated by a newline if the line starts with a number (in order to skip whatever isn't a progress line, e.g. headers, preambles and summary lines). I reversed the 5th & 4th filed order because vw progress lines have the observed value before the predicted value and you asked for the opposite order.

UPDATE

Aaron published a working example using this solution in Google Drive: https://drive.google.com/open?id=0BzKSYsAMaJLjZzJlWFA2N3NnZGc

Upvotes: 4

Related Questions