TypeError in scikit-learn CountVectorizer

Question

I am trying to do some text analysis with scikit-learn. However when I try to call CountVectorizer an error is raised. The example code and the raised error are below:

    >>> from sklearn.feature_extraction.text import CountVectorizer
    >>> corpus = [  'This is the first document.', 'This is the second second document.',  'And    the third one.',  'Is this the first document?', ]
    >>> vectorizer = CountVectorizer(min_df=1)
    >>> X = vectorizer.fit_transform(corpus)
    Traceback (most recent call last):
    File "", line 1, in 
    File "/Library/Python/2.6/site-packages/sklearn/feature_extraction/text.py", line 789, in fit_transform
    vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
    File "/Library/Python/2.6/site-packages/sklearn/feature_extraction/text.py", line 716, in _count_vocab
    vocabulary = defaultdict(None)
    TypeError: first argument must be callable

Is this a bug or something with my installation? Other examples are working fine.

ogrisel · Accepted Answer

To summarize the discussion in the comments: this is a bug in Python 2.6.1 that has been fixed more recent versions of Python 2.6 (and later as 2.7+, 3.2+...).

TypeError in scikit-learn CountVectorizer

Answers (1)

Related Questions