Reputation: 2411
I'm trying to fit the model for binary classification and predict the probability of values belonging to these classes.
My first problem is that I can't interpret the results. I have a training set in whichlabels=0
and labels=1
(not -1 and +1
).
I run the model:
vw train.vw -f model.vw --link=logistic
Next:
vw test.vw -t -i model.vw -p pred.txt
Then I have a file pred.txt
with these values:
0.5
0.5111
0.5002
0.5093
0.5
I don't understand what mean 0.5? All value in pred.txt
about 0.5. I wrote the script and deducted from results 0.5. I get this lines:
0
0.111
0.002
0.093
0
Is that my desired probability?
And here is my second problem - I have unbalanced target class. I have a 95% negative (0) and 5% positive results (1). How can I prescribe that VW made the imbalance of classes, like {class 0:0.1, class 1:0.9}
?
Or it should be done when preparing dataset?
Upvotes: 2
Views: 1043
Reputation: 2670
For binary classification in VW, the labels need to be converted (from 0 and 1) to -1 and +1, e.g. with sed -e 's/^0/-1/'
.
In addition to --link=logistic
you need to use also --loss_function=logistic
if you want to interpret the predictions as probabilities.
For unbalanced classes, you need to use importance weighting and tune the importance weight constant on heldout set (or cross-validation) with some external evaluation metric of your choice (e.g. AUC or F1).
See also:
Calculating AUC when using Vowpal Wabbit
Vowpal Wabbit Logistic Regression
How to perform logistic regression using vowpal wabbit on very imbalanced dataset
Upvotes: 3