Reputation: 303
I have a database in which I store data based upon the following three fields: id, text, {labels}. Note that each text has been assigned to more than one label \ tag \ class. I want to build a model (weka \ rapidminer \ mahout) that will be able to recommend \ classify a bunch of labels \ tags \ classes to a given text.
I have heard about SVM and Naive Bayes Classifier, but not sure whether they support multi-label classification or not. Anything that guides me to the right direction is more than welcome!
Upvotes: 7
Views: 3930
Reputation: 31
Can suggest you some tools which are extensions to weka that does multi-label classification.
There is also a SVM lib extension SVMLib. If you are happy with python packages, scikit learning also provides one for Multi-label classification
Also, this recent paper in ICML 2013 "Efficient Multi-label Classification with Many Labels" should help you in implementation. If you want to implement one on your own.
Upvotes: 0
Reputation: 930
SVM is a binary classifier by nature, but there are many alternatives that allow it to be apply to multi-label environments, basically by combining multiple binary instances of SVM.
Some examples are in the SVM Wikipedia article in the multi-class section. I am not sure if you are interested in the details, but they are included in Weka and Rapidminer. For example, the SMO classifier is one of the variations to apply SVM to multilabel problems.
Naive Bayes can be directly applied to multi-label environments.
Upvotes: 1
Reputation: 363587
The basic multilabel classification method is one-vs.-the-rest (OvR), also called binary relevance (BR). The basic idea is that you take an off-the-shelf binary classifier, such as Naive Bayes or an SVM, then create K instances of it to solve K independent classification problems. In Python-like pseudocode:
for each class k:
learner = SVM(settings) # for example
labels = [class_of(x) == k for x in samples]
learner.learn(samples, labels)
Then at prediction time, you just run each of the binary classifiers on a sample and collect the labels for which they predict positive.
(Both training and prediction can obviously be done in parallel, since the problems are assumed to be independent. See Wikipedia for links to two Java packages that do multi-label classification.)
Upvotes: 4