orenob
orenob

Reputation: 11

Use Vowpal wabbit with probabilities as labels to predict probabilities

I'm trying to use Vowpal Wabbit to predict probabilities given existing set of statistics. My txt file looks like that:

0.22 | Features1
0.28 | Features2

Now, given this example, I want to predict the label (probability) for Features3. I'm trying to use logistic regression:

vw -d ds.vw.txt -f model.p --loss_function=logistic --link=logistic -p probs.txt

But get the error :

You are using label 0.00110011 not -1 or 1 as loss function expects!
You are using label 0.00559702 not -1 or 1 as loss function expects!

etc..

How can I use these statistics as labels to predict probabilities?

Upvotes: 1

Views: 494

Answers (1)

arielf
arielf

Reputation: 5952

To predict a continuous label you need to use one of the following loss functions:

--loss_function squared    # optimizes for min loss vs mean
--loss_function quantile   # optimizes for min loss vs median

--loss_function squared is the vw default, so you may leave it out.

Another trick you may use is to map your probability range into [-1, 1] by mapping the mid-point 0.5 to 0.0 using the function (2*probability - 1). You can then use --loss_function logistic which requires binary labels (-1 and 1), but follow the labels with abs(probability) as a floating point weight:

1 0.22 | features...
-1 0.28 | features...

This may or may not work better for your particular data (you'll have to hold-out some of your data and test your different models for accuracy.)

Background regarding binary outcomes: vw "starting point" (i.e null, or initial model) is 0.0 weights everywhere. This is why when you're doing a logistic regression, the negative, positive labels must be -1, 1 (rather than 0, 1) respectively.

Upvotes: 1

Related Questions