cnstlungu
cnstlungu

Reputation: 557

Using Pandas and Sklearn.Neighbors

I'm trying to fit a KNN model on a dataframe, using Python 3.5/Pandas/Sklearn.neighbors. I've imported the data, split it into training and testing data and labels, but when I try to predict using it, I get the following error. I'm quite new to Pandas so any help would be appreciated, thanks!

enter image description here

import pandas as pd
from sklearn import cross_validation
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsRegressor(n_neighbors=30)
knn.fit(x_train,y_train)
knn.predict(x_test)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-121-2292e64e5ab8> in <module>()
----> 1 knn.predict(x_test)

C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)
    151 
    152         if weights is None:
--> 153             y_pred = np.mean(_y[neigh_ind], axis=1)
    154         else:
    155             y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float)

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)
   2876 
   2877     return _methods._mean(a, axis=axis, dtype=dtype,
-> 2878                           out=out, keepdims=keepdims)
   2879 
   2880 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
     66     if isinstance(ret, mu.ndarray):
     67         ret = um.true_divide(
---> 68                 ret, rcount, out=ret, casting='unsafe', subok=False)
     69     elif hasattr(ret, 'dtype'):
     70         ret = ret.dtype.type(ret / rcount)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

Upvotes: 1

Views: 8799

Answers (1)

ode2k
ode2k

Reputation: 2723

You should be using the KNeighborsClassifier for this KNN. You are trying to predict the label Species for classification. The regressor in your code above is trying to train and predict continuously valued numerical variables, which is where your problem is being introduced.

from sklearn.neighbors import KNeighborsClassifier
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsClassifier(n_neighbors=30)

http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html

Here is what the regressor would plot compared to the classifier (which you want to use).

Regressor

Classifier

Upvotes: 3

Related Questions