Reputation: 13
I have a dataset, data
, and a labeled array, target
, with which I build in scikit-learn a supervised model using the k-Nearest Neighbors algorithm.
neigh = KNeighborsClassifier()
neigh.fit(data, target)
I am now able to classify my learning set using this very model. To get the classification score :
neigh.score(data, target)
Now my problem is that this score depends on the type of the target
object.
list()
and filled in with target.append()
, the score method returns 0.68. target = np.empty(shape=(length,1), dtype="S36")
(it contains only 36-character strings), and filled in with target[k] = value
, the score method returns 0.008. To make sure whether results were really different or not, I created text files that list the results of
for k in data:
neigh.predict(k)
in each case. The results were the same.
What can explain the score difference ?
Upvotes: 1
Views: 1088
Reputation: 363818
@Harel spotted the problem, here's the explanation:
np.empty(shape=(length, 1), dtype="S36")
creates an array of the wrong shape. scikit-learn estimators almost invariably want 1-d arrays, i.e. shape=length
. The fact that this doesn't raise an exception is an oversight.
Upvotes: 2