Reputation: 135
I've been using the (excellent) NLTK-Trainer in order to train a NaiveBayes classifier to classify snippets of text. I see that NLTK-Trainer also supports the scikit-learn algorithms, and I would like to use these in hopes of decreasing memory usage/ increasing accuracy.
However, when I try to specify one of the scikit-learn classifiers when I run train_classifier.py, it throws an error:
train_classifier.py: error: argument --classifier/--algorithm: invalid choice: 'sklearn.BernoulliNB' (choose from 'NaiveBayes', 'DecisionTree', 'Maxent', 'GIS', 'IIS', 'MEGAM', 'TADM')
I am running the 32-bit Anaconda distribution (2.20) of Python 3.4.3 on Windows 7. "pip freeze" gives me the following: NLTK 3.0.4, scikit-learn 0.16.1. I believe I am using the latest version of NLTK-Trainer (I downloaded it a month ago).
After doing some research, I have two theories into what is going wrong: 1. There is some sort of arg parse error that isn't passing the --classifier sklearn.BernoulliNB to train_classifer.py correctly. After I do a traceback on the error, it gives me this
nltk_data\nltk-trainer-master\nltk-trainer-master\train_classifier.py in <module>()
131 nltk_trainer.classification.args.add_sklearn_args(parser)
132
--> 133 args = parser.parse_args()
134
AppData\Local\Continuum\Anaconda3\lib\argparse.py in parse_args(self, args, namespace)
1726 # =====================================
1727 def parse_args(self, args=None, namespace=None):
-> 1728 args, argv = self.parse_known_args(args, namespace)
1729 if argv:
1730 msg = _('unrecognized arguments: %s')
1765 except ArgumentError:
1766 err = _sys.exc_info()[1]
-> 1767 self.error(str(err))
1768
1769 def _parse_known_args(self, arg_strings, namespace):
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline
from sklearn import ensemble, feature_selection, linear_model, naive_bayes, neighbors, svm, tree
This has been really frustrating, and I can't quite put my finger on why it isn't working. Any assistance would be much appreciated!
Upvotes: 3
Views: 312
Reputation: 231355
argparse
is just code that takes your commandline arguments, and parses them. It does not use or act on those arguments. That's done by following code. The parser is just the gatekeeper, making sure that your inputs look correct.
I'm not familiar with NLTK-Trainer
, but I can see what it's parser is doing.
From the error message it is clear that your argument, 'sklearn.BernoulliNB' is getting through. But the --classifier
argument was set up to only accept one of the strings in the choices
list. ['NaiveBayes', 'DecisionTree',...]
. It doesn't accept just any name or module reference.
It is likely that the program takes an accepted name and maps it onto some other function, module or parameter.
Try calling this code with -h
or --help
, to see what arguments it acceepts. And go to the program documentation to see what it says about the input. Maybe there is some other way of specifying the alternative algorithms. The --classifier
is clearly setup to accept only a predefined set of value.
Upvotes: 1