Reputation: 1631
I am currently exploring PU learning. This is learning from positive and unlabeled data only. One of the publications [Zhang, 2009] asserts that it is possible to learn by modifying the loss function of an algorithm of a binary classifier with probabilistic output (for example Logistic Regression). Paper states that one should optimize Balanced Accuracy.
Vowpal Wabbit currently supports five loss functions [listed here]. I would like to add a custom loss function where I optimize for AUC (ROC), or equivalently, following the paper: 1 - Balanced_Accuracy.
I am unsure where to start. Looking at the code reveals that I need to provide 1st, 2nd derivatives and some other info. I could also run the standard algorithm with Logistic loss but trying to adjust l1 and l2 according to my objective (not sure if this is good). I would be glad to get any pointers or advices on how to proceed.
UPDATE More search revealed that it is impossible/difficult to optimize for AUC in online learning: answer
Upvotes: 3
Views: 1199
Reputation: 1631
I found two software suites that are immediately ready to do PU learning:
(1) SVM perf from Joachims
Use the ``-l 10'' option here!
(2) Sofia-ml
Use ``--loop_type roc'' option here!
In general you set +1'' labels to your positive examples and
-1'' to all unlabeled ones. Then you launch the training procedure followed by prediction.
Both softwares give you some performance metrics. I would suggest to use standardized and well established binary from KDD`04 cup: ``perf''. Get it here.
Hope it helps for those wondering how this works in practice. Perhaps I prevented the case XKCD
Upvotes: 2