Vladislavs Dovgalecs
Vladislavs Dovgalecs

Reputation: 1631

How to use custom loss function (PU Learning)

I am currently exploring PU learning. This is learning from positive and unlabeled data only. One of the publications [Zhang, 2009] asserts that it is possible to learn by modifying the loss function of an algorithm of a binary classifier with probabilistic output (for example Logistic Regression). Paper states that one should optimize Balanced Accuracy.

Vowpal Wabbit currently supports five loss functions [listed here]. I would like to add a custom loss function where I optimize for AUC (ROC), or equivalently, following the paper: 1 - Balanced_Accuracy.

I am unsure where to start. Looking at the code reveals that I need to provide 1st, 2nd derivatives and some other info. I could also run the standard algorithm with Logistic loss but trying to adjust l1 and l2 according to my objective (not sure if this is good). I would be glad to get any pointers or advices on how to proceed.

UPDATE More search revealed that it is impossible/difficult to optimize for AUC in online learning: answer

Upvotes: 3

Views: 1199

Answers (1)

Vladislavs Dovgalecs
Vladislavs Dovgalecs

Reputation: 1631

I found two software suites that are immediately ready to do PU learning:

(1) SVM perf from Joachims

Use the ``-l 10'' option here!

(2) Sofia-ml

Use ``--loop_type roc'' option here!

In general you set +1'' labels to your positive examples and-1'' to all unlabeled ones. Then you launch the training procedure followed by prediction.

Both softwares give you some performance metrics. I would suggest to use standardized and well established binary from KDD`04 cup: ``perf''. Get it here.

Hope it helps for those wondering how this works in practice. Perhaps I prevented the case XKCD

Upvotes: 2

Related Questions