Gilaztdinov Rustam
Gilaztdinov Rustam

Reputation: 2411

Vowpal Wabbit unbalanced classes

I'm trying to fit the model for binary classification and predict the probability of values belonging to these classes.

My first problem is that I can't interpret the results. I have a training set in whichlabels=0 and labels=1 (not -1 and +1).

I run the model:

vw train.vw -f model.vw --link=logistic

Next:

vw test.vw -t -i model.vw -p pred.txt

Then I have a file pred.txt with these values:

0.5 0.5111 0.5002 0.5093 0.5

I don't understand what mean 0.5? All value in pred.txt about 0.5. I wrote the script and deducted from results 0.5. I get this lines:

0 0.111 0.002 0.093 0

Is that my desired probability?

And here is my second problem - I have unbalanced target class. I have a 95% negative (0) and 5% positive results (1). How can I prescribe that VW made the imbalance of classes, like {class 0:0.1, class 1:0.9}?

Or it should be done when preparing dataset?

Upvotes: 2

Views: 1043

Answers (1)

Martin Popel
Martin Popel

Reputation: 2670

For binary classification in VW, the labels need to be converted (from 0 and 1) to -1 and +1, e.g. with sed -e 's/^0/-1/'.

In addition to --link=logistic you need to use also --loss_function=logistic if you want to interpret the predictions as probabilities.

For unbalanced classes, you need to use importance weighting and tune the importance weight constant on heldout set (or cross-validation) with some external evaluation metric of your choice (e.g. AUC or F1).

See also:

Calculating AUC when using Vowpal Wabbit

Vowpal Wabbit Logistic Regression

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

Upvotes: 3

Related Questions