Reputation: 9
Load popular digits dataset from sklearn.datasets
module and assign it to variable digits.
Split digits.data
into two sets names X_train
and X_test
. Also, split digits.target into two sets Y_train
and Y_test
.
Hint: Use train_test_split()
method from sklearn.model_selection
; set random_state
to 30; and perform stratified sampling.
Build an SVM classifier from X_train
set and Y_train
labels, with default parameters. Name the model as svm_clf
.
Evaluate the model accuracy on the testing data set and print its score. I used the following code:
import sklearn.datasets as datasets
import sklearn.model_selection as ms
from sklearn.model_selection import train_test_split
digits = datasets.load_digits();
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30)
print(X_train.shape)
print(X_test.shape)
from sklearn.svm import SVC
svm_clf = SVC().fit(X_train, y_train)
print(svm_clf.score(X_test,y_test))
I got the below output.
(1347,64)
(450,64)
0.4088888888888889
But I am not able to pass the test. Can someone help with what is wrong?
Upvotes: 0
Views: 2470
Reputation: 60390
You are missing the stratified sampling requirement; modify your split to include it:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30, stratify=y)
Check the documentation.
Upvotes: 4