jovan lipovatz
jovan lipovatz

Reputation: 31

NLTK Trainer: Cannot get Scikit-Learn classifiers to work

I am using Python 2.7 and the awesome tool created by Jacob Perkins called NLTK Trainer. I have successfully used the NaiveBayes classifier, but when I try and use the various scikit-learn classifiers it throws up error messages. Please help. Here is my code and the associated error message.

C:\WINDOWS\system32>C:\Python27\python  C:\Users\ned\Desktop\nltk-trainer-master
\train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2
 --ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.Mu
ltinomialNB



training sklearn.MultinomialNB classifier
C:\Python27\lib\site-packages\numpy\core\fromnumeric.py:2499: VisibleDeprecation
Warning: `rank` is deprecated; use the `ndim` attribute or function instead. To
find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
Traceback (most recent call last):
  File "C:\Users\ned\Desktop\nltk-trainer-master\train_classifier.py", line 385,
 in <module>
    print('accuracy: %f' % accuracy(classifier, test_feats))
  File "C:\Python27\lib\site-packages\nltk\classify\util.py", line 87, in accura
cy
    results = classifier.classify_many([fs for (fs, l) in gold])
  File "C:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 83, in
 classify_many
    X = self._vectorizer.transform(featuresets)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 286, in transform
    return self._transform(X, fitting=False)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 196, in _transform
    result_matrix.sort_indices()
  File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 619, in
sort_indices
    fn( len(self.indptr) - 1, self.indptr, self.indices, self.data)
  File "C:\Python27\lib\site-packages\scipy\sparse\sparsetools\csr.py", line 546
, in csr_sort_indices
    return _csr.csr_sort_indices(*args)
TypeError: Array of type 'byte' required.  Array of type 'bool' given

I then using the following versions: Python 2.7.10

Python 2.7 numpy 1.9.1

Python 2.7 scikit-learn 0.16.1

Python 2.7 scipy 0.10.1

Python 2.7 NLTK 3.0.4

Argparse 1.3.0

*** Thanks everybody for all the help. The problem was indeed an out of date library. I installed up-to-date versions from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/ and used the simply installation guide from here: https://www.youtube.com/watch?v=jnpC_Ib_lbc

Upvotes: 2

Views: 542

Answers (3)

espeed
espeed

Reputation: 4814

You're using scipy 0.10.1, which is several versions back -- try upgrading to scipy 0.14.

Here's an example of it working and the versions of the packages used...

$ python
Python 2.7.10 (default, Jul  5 2015, 14:15:43) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.__version__
'0.14.1'
>>> import numpy
>>> numpy.__version__
'1.9.2'
>>> import sklearn
>>> sklearn.__version__
'0.16.1'
>>> import nltk
>>> nltk.__version__
'3.0.4'
>>> import argparse
>>> argparse.__version__
'1.1'

$ python train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2 --ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.MultinomialNB
loading movie_reviews
2 labels: [u'neg', u'pos']
calculating word scores
using bag of words from known set feature extraction
71903 words meet min_score and/or max_feats
1500 training feats, 500 testing feats
training sklearn.MultinomialNB with {'alpha': 1.0}
using dtype bool
training sklearn.MultinomialNB classifier
accuracy: 0.788000
neg precision: 0.918605
neg recall: 0.632000
neg f-measure: 0.748815
pos precision: 0.719512
pos recall: 0.944000
pos f-measure: 0.816609

Upvotes: 3

jackiekazil
jackiekazil

Reputation: 5716

Possibly related? https://github.com/scipy/scipy/issues/2058 And if not, it might give you more clarification of the problem.

In the other ticket, if it was a version problem, I would do version checks on everything. I think Python 3 is being more actively developed / supported than 2.7 these days.

Upvotes: 0

earino
earino

Reputation: 2935

I noticed an issue in the github repository for this project with the exact error message:

https://github.com/japerk/nltk-trainer/issues/12

The user stated:

Got it, I had trained the classifier on a different machine with diff versions of scipy and/or sklearn.

On your example above it seems like you trained on the same machine that you are running on, is that the case?

Upvotes: 0

Related Questions