Can't get NLTK-Trainer to recognize/ work with scikit-learn classifiers

Question

I've been using the (excellent) NLTK-Trainer in order to train a NaiveBayes classifier to classify snippets of text. I see that NLTK-Trainer also supports the scikit-learn algorithms, and I would like to use these in hopes of decreasing memory usage/ increasing accuracy.

However, when I try to specify one of the scikit-learn classifiers when I run train_classifier.py, it throws an error:

train_classifier.py: error: argument --classifier/--algorithm: invalid choice: 'sklearn.BernoulliNB' (choose from 'NaiveBayes', 'DecisionTree', 'Maxent', 'GIS', 'IIS', 'MEGAM', 'TADM')

I am running the 32-bit Anaconda distribution (2.20) of Python 3.4.3 on Windows 7. "pip freeze" gives me the following: NLTK 3.0.4, scikit-learn 0.16.1. I believe I am using the latest version of NLTK-Trainer (I downloaded it a month ago).

After doing some research, I have two theories into what is going wrong: 1. There is some sort of arg parse error that isn't passing the --classifier sklearn.BernoulliNB to train_classifer.py correctly. After I do a traceback on the error, it gives me this

nltk_data ltk-trainer-master ltk-trainer-master rain_classifier.py in () 131 nltk_trainer.classification.args.add_sklearn_args(parser) 132 --> 133 args = parser.parse_args() 134 AppData\Local\Continuum\Anaconda3\lib\argparse.py in parse_args(self, args, namespace) 1726 # ===================================== 1727 def parse_args(self, args=None, namespace=None): -> 1728 args, argv = self.parse_known_args(args, namespace) 1729 if argv: 1730 msg = _('unrecognized arguments: %s') 1765 except ArgumentError: 1766 err = _sys.exc_info()[1] -> 1767 self.error(str(err)) 1768 1769 def _parse_known_args(self, arg_strings, namespace):

My other hypothesis is that the scikit-learn files that were included with Anaconda are in a place where NLTK-Trainer can't find them. Per Jacob Perkins' recommendations here (comment) I can run the 'from nltk.classify import scikitlearn' command without error. However, when I look further into the nltk-trainer/args.py code here (code), I cannot run the code following the 'import command'. All of these lines throw errors.

from sklearn.feature_extraction.text import TfidfTransformer from sklearn.pipeline import Pipeline from sklearn import ensemble, feature_selection, linear_model, naive_bayes, neighbors, svm, tree

This has been really frustrating, and I can't quite put my finger on why it isn't working. Any assistance would be much appreciated!

Can't get NLTK-Trainer to recognize/ work with scikit-learn classifiers

Answers (1)

Related Questions

Can&#39;t get NLTK-Trainer to recognize/ work with scikit-learn classifiers

Answers (1)

Related Questions

Can't get NLTK-Trainer to recognize/ work with scikit-learn classifiers