Reputation: 10069
I am using weka SMO classifier for classify the documents.There are many parameters for smo available like Kernal, tolerance etc.., I tested using different parameters but i not get good result large data set.
For more than 90 category only 20% documents getting correctly classified.
Please anyone tell me the best set of parameter to get highest performance in SMO.
Upvotes: 1
Views: 1686
Reputation: 28492
Principal issue here is not classification itself, but rather selecting suitable features. Using raw HTML leads to very large noise which in its turn makes classification results very poor. Thus, to get good results do the following:
Most probably classifier type won't play a big role here: dictionary-based features normally lead to quite exact results regardless of classification technique in use. You can use SVM (SMO), Naive Bayes, ANN or even kNN. More sophisticated methods include creation of category hierarchy, where, for example, category "coffee" is included into category "drinks" which in its turn is part of category "food".
Upvotes: 3