cardboard
cardboard

Reputation: 13

Results differ whether using a list or a numpy array in scikit-learn

I have a dataset, data, and a labeled array, target, with which I build in scikit-learn a supervised model using the k-Nearest Neighbors algorithm.

neigh = KNeighborsClassifier()
neigh.fit(data, target)

I am now able to classify my learning set using this very model. To get the classification score :

neigh.score(data, target)


Now my problem is that this score depends on the type of the target object.

To make sure whether results were really different or not, I created text files that list the results of

for k in data:
    neigh.predict(k)

in each case. The results were the same.

What can explain the score difference ?

Upvotes: 1

Views: 1088

Answers (1)

Fred Foo
Fred Foo

Reputation: 363818

@Harel spotted the problem, here's the explanation:

np.empty(shape=(length, 1), dtype="S36")

creates an array of the wrong shape. scikit-learn estimators almost invariably want 1-d arrays, i.e. shape=length. The fact that this doesn't raise an exception is an oversight.

Upvotes: 2

Related Questions