Reputation: 859
I'm learning some basics in machine learning in Python (scikit - learn) and when I tried to implement the K-nearest neighbors algorithm an error occurs: ValueError: Found input variables with inconsistent numbers of samples: [426, 143]. I have no idea how to deal with it.
This is my code:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
cancer = load_breast_cancer()
X_train, y_train, X_test, y_test = train_test_split(cancer.data,cancer.target,
stratify =
cancer.target,
random_state = 0)
clf = KNeighborsClassifier(n_neighbors = 6)
clf.fit(X_train, y_train)`
Upvotes: 0
Views: 676
Reputation: 426
train_test_split
returns a tuple in the order X_train, X_test, y_train, y_test
You've assigned the return values to the wrong variables so you are fitting with the training data and the test data instead of the training data and the training labels.
It should be
X_train, X_test, y_train, y_test = train_test_split()
Upvotes: 1