Reputation: 229
This is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn import neighbors
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original')
sample = np.random.randint(70000, size=5000)
data = mnist.data[sample]
target = mnist.data[sample]
xtrain, xtest, ytrain, ytest = train_test_split(data, target, train_size=0.8)
knn = neighbors.KNeighborsClassifier(n_neighbors=3)
knn.fit(xtrain, ytrain)
error = 1 - knn.score(xtest, ytest)
print('Erreur: %f' % error)
when I run "python numb.py" I got this message error:
File "/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.py", line 88, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported
Upvotes: 3
Views: 1673
Reputation: 13218
This is a simple typo. ytest
has the wrong shape, because you should write
target = mnist.target[sample]
Correcting this, the script runs fine.
Also, the way you build sample
, you may have duplicates in it, which means some images may be both in test and train set. Consider using np.random.permutation
to shuffle the order of your samples.
And consider using a seed before calling np.random, to get reproducible results (or better, use check_random_state
from sklearn
)
Upvotes: 4