Reputation:
I want to apply SVM classification for text-mining purpose using python nltk and get precision, recall accuracy different measurement information.For doing this, I preprocess dataset and split my dataset into two text files namely-pos_file.txt (positive label) and neg_file.txt (negative label). And now I want to apply SVM classifier with Random Sampling 70% for training the data and 30% for testing. I saw some documentation of scikit-learn, but not exactly sure how I shall apply this?
Both pos_file.txt and neg_file.txt are can be considered as bag of words. Useful links-
Sample files: pos_file.txt
stackoverflowerror restor default properti page string present
multiprocess invalid assert fetch process inform
folderlevel discoveri option page seen configur scope select project level
Sample files: neg_file.txt
class wizard give error enter class name alreadi exist
unabl make work linux
eclips crash
semant error highlight undeclar variabl doesnt work
And furthermore it would be interesting to apply the same approach for unigram, bigram and trigram. Looking forward your suggestion or sample code.
Upvotes: 3
Views: 7064
Reputation: 2160
Below is a very rough guideline of applying SVM to text classification:
The following sklearn documentation is a really good example of performing text classification in the sklearn framework, which I would recommend as a starting point:
Classification of text documents using sparse features
Upvotes: 8