Reputation: 55
X_train, X_test, y_train, y_test = train_test_split(features, df['Label'], test_size=0.2, random_state=111)
print (X_train.shape) # (540, 4196)
print (X_test.shape) # (136, 4196)
print (y_train.shape) # (540,)
print (y_test.shape) # (136,)
When fitting, it gives error:
from sklearn.svm import SVC
classifier = SVC(random_state = 0)
classifier.fit(features,y_train)
y_pred = classifier.predict(features)
Error:
ValueError: Found input variables with inconsistent numbers of samples: [676, 540]
I tried this.
Upvotes: 0
Views: 11341
Reputation: 427
You want to call the fit
function with you X_train
, not with features
. The error occurs because features
and y_train
don't have the same size.
X_train, X_test, y_train, y_test = train_test_split(features, df['Label'], test_size=0.2, random_state=111)
print (X_train.shape)
print (X_test.shape)
print (y_train.shape)
print (y_test.shape)
from sklearn.svm import SVC
classifier = SVC(random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
You'll likely also want to call predict
with X_test
or X_train
. You may want to learn a bit more about train/test splits and why they are used.
Upvotes: 2
Reputation: 513
Why are you using the features along y_train for the .fit()? I think you are supposed to use X_train instead.
Instead of
classifier.fit(features, y_train)
Use:
classifier.fit(X_train, y_train)
You are trying to use two sets of data with different shape, since you did the split earlier. So features has more samples than y_train.
Also, for you predict line. It should be:
.predict(x_test)
Upvotes: 1