Harsh Shah
Harsh Shah

Reputation: 348

Machine learning algorithm to classify only positive and unlabeled data

I am trying to classify text with only positive features and unlabeled data. I just want the algorithm to identify the positive data and want to mark everything else as negative. What would be a good machine learning algorithm to classify such data? I tried using different algorithms in Weka but almost all classifiers give a lot of false positives.

Upvotes: 2

Views: 362

Answers (1)

user2566092
user2566092

Reputation: 4661

If you believe that the unlabelled data is mostly negatives, then probably the best thing to do is to label all unlabelled data as "negative" and run your classifier of choice. Note that if you get an unlabelled testing data point predicted to be positive, this does not mean the answer is wrong. Some of your unlabelled data could be positive. So it's hard to judge how well your classifier is doing in your setting. If you believe that your unlabelled data might be biased toward the positives then you're probably better off using so-called "one-class classifiers" on the positive data, there are popular examples including one-class SVM.

Upvotes: 3

Related Questions