user2295350
user2295350

Reputation: 303

Multi-Label Document Classification

I have a database in which I store data based upon the following three fields: id, text, {labels}. Note that each text has been assigned to more than one label \ tag \ class. I want to build a model (weka \ rapidminer \ mahout) that will be able to recommend \ classify a bunch of labels \ tags \ classes to a given text.

I have heard about SVM and Naive Bayes Classifier, but not sure whether they support multi-label classification or not. Anything that guides me to the right direction is more than welcome!

Upvotes: 7

Views: 3930

Answers (3)

Aditya Mogadala
Aditya Mogadala

Reputation: 31

Can suggest you some tools which are extensions to weka that does multi-label classification.

  1. MEKA: A Multi-label Extension to WEKA
  2. Mulan: A Java library for multi-label learning

There is also a SVM lib extension SVMLib. If you are happy with python packages, scikit learning also provides one for Multi-label classification

Also, this recent paper in ICML 2013 "Efficient Multi-label Classification with Many Labels" should help you in implementation. If you want to implement one on your own.

Upvotes: 0

miguelmalvarez
miguelmalvarez

Reputation: 930

SVM is a binary classifier by nature, but there are many alternatives that allow it to be apply to multi-label environments, basically by combining multiple binary instances of SVM.

Some examples are in the SVM Wikipedia article in the multi-class section. I am not sure if you are interested in the details, but they are included in Weka and Rapidminer. For example, the SMO classifier is one of the variations to apply SVM to multilabel problems.

Naive Bayes can be directly applied to multi-label environments.

Upvotes: 1

Fred Foo
Fred Foo

Reputation: 363587

The basic multilabel classification method is one-vs.-the-rest (OvR), also called binary relevance (BR). The basic idea is that you take an off-the-shelf binary classifier, such as Naive Bayes or an SVM, then create K instances of it to solve K independent classification problems. In Python-like pseudocode:

for each class k:
    learner = SVM(settings)  # for example
    labels = [class_of(x) == k for x in samples]
    learner.learn(samples, labels)

Then at prediction time, you just run each of the binary classifiers on a sample and collect the labels for which they predict positive.

(Both training and prediction can obviously be done in parallel, since the problems are assumed to be independent. See Wikipedia for links to two Java packages that do multi-label classification.)

Upvotes: 4

Related Questions