Hendrra
Hendrra

Reputation: 859

"Inconsistent numbers of samples" - scikit - learn

I'm learning some basics in machine learning in Python (scikit - learn) and when I tried to implement the K-nearest neighbors algorithm an error occurs: ValueError: Found input variables with inconsistent numbers of samples: [426, 143]. I have no idea how to deal with it.
This is my code:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
cancer = load_breast_cancer()
X_train, y_train, X_test, y_test = train_test_split(cancer.data,cancer.target, 
                                                    stratify = 
                                                    cancer.target,
                                                    random_state = 0)
clf = KNeighborsClassifier(n_neighbors = 6)
clf.fit(X_train, y_train)`

Upvotes: 0

Views: 676

Answers (1)

dugup
dugup

Reputation: 426

train_test_split returns a tuple in the order X_train, X_test, y_train, y_test

You've assigned the return values to the wrong variables so you are fitting with the training data and the test data instead of the training data and the training labels.

It should be

X_train, X_test, y_train, y_test = train_test_split()

Upvotes: 1

Related Questions