Reputation: 11
I'm trying to use Vowpal Wabbit to predict probabilities given existing set of statistics. My txt file looks like that:
0.22 | Features1
0.28 | Features2
Now, given this example, I want to predict the label (probability) for Features3. I'm trying to use logistic regression:
vw -d ds.vw.txt -f model.p --loss_function=logistic --link=logistic -p probs.txt
But get the error :
You are using label 0.00110011 not -1 or 1 as loss function expects!
You are using label 0.00559702 not -1 or 1 as loss function expects!
etc..
How can I use these statistics as labels to predict probabilities?
Upvotes: 1
Views: 494
Reputation: 5952
To predict a continuous label you need to use one of the following loss functions:
--loss_function squared # optimizes for min loss vs mean
--loss_function quantile # optimizes for min loss vs median
--loss_function squared
is the vw
default, so you may leave it out.
Another trick you may use is to map your probability range into [-1, 1]
by mapping the mid-point 0.5 to 0.0 using the function (2*probability - 1). You can then use --loss_function logistic
which requires binary labels (-1
and 1
), but follow the labels with abs(probability)
as a floating point weight:
1 0.22 | features...
-1 0.28 | features...
This may or may not work better for your particular data (you'll have to hold-out some of your data and test your different models for accuracy.)
Background regarding binary outcomes: vw
"starting point" (i.e null, or initial model) is 0.0 weights everywhere. This is why when you're doing a logistic regression, the negative, positive
labels must be -1, 1
(rather than 0, 1
) respectively.
Upvotes: 1