Reputation: 61
I'm trying out Python instead of R for data analysis and am having a bit of trouble. So I've been reading scikit-learn's documentation and tried running their kmeans example on my own but get this error message:
Extracting features from the training dataset using a sparse vectorizer Traceback (most recent call last):
File "kmeans.py", line 104, in X = vectorizer.fit_transform(dataset.data)
File "/Library/Python/2.7/site-packages/scikit_learn-0.15_git-py2.7-macosx-10.9-intel.egg/sklearn/feature_extraction/text.py", line 1238, in fit_transform return self._tfidf.transform(X, copy=False)
File "/Library/Python/2.7/site-packages/scikit_learn-0.15_git-py2.7-macosx-10.9-intel.egg/sklearn/feature_extraction/text.py", line 1010, in transform X = normalize(X, norm=self.norm, copy=False)
File "/Library/Python/2.7/site-packages/scikit_learn-0.15_git-py2.7-macosx-10.9-intel.egg/sklearn/preprocessing/data.py", line 542, in normalize inplace_csr_row_normalize_l2(X)
File "sparsefuncs.pyx", line 146, in sklearn.utils.sparsefuncs.inplace_csr_row_normalize_l2 (sklearn/utils/sparsefuncs.c:2714)
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'
For reference, the code is here: http://scikit-learn.org/stable/auto_examples/document_clustering.html
It took me a bit of fiddling to get the whole scipy stack but I'm sure I have it now, just wondering why copy-pasting their code and then running it would give an error (I'm sure they wouldn't put code with a bug on their site). Any idea on what the fix is/ what's happening?
Upvotes: 1
Views: 858
Reputation: 905
One helpful approach is to install Anaconda and the PyCharm or Eclipse IDE. Point your intepreter from the IDE to the Anaconda lib. Refer to this link for more guide: http://docs.continuum.io/anaconda/ide_integration.html. Plus it is supereasy to update pkgs including scikit with "conda update" and "anaconda update" from the shell.
Upvotes: 0
Reputation: 2476
How did you install the scipy stack? I strongly suggest that you don't try to assemble a stack yourself, as it is quite challenging to do. I would rather push you to use anaconda https://store.continuum.io/cshop/anaconda/.
Disclaimer: 1) I don't work these guys. 2) anaconda has a free version. It's good.
Upvotes: 0