John Sall
John Sall

Reputation: 1163

Is there a difference between using accuracy_score with shuffled labels and without?

So first I have sorted labels, for example, 40 rows labeled A, and then other 40 rows labeled B, 40 rows labeled C and then 40 labeled D. all in this order making a list of 160 labels.

after predicting with both labels(shuffled and unshuffled) Here are my scores:

shuffled:
0.14375

not shuffled:
0.30434782608695654

my question is: shouldn't both be the same? or this is normal and I'm not making a mistake?

Upvotes: 1

Views: 52

Answers (1)

doctorlove
doctorlove

Reputation: 19272

There are many circumstances under which the results can change when the order of the training inputs' order is changed.

For example, The Nearest neighbours warns

Regarding the Nearest Neighbors algorithms, if two neighbors k+1 and k have identical distances but different labels, the result will depend on the ordering of the training data.

Other algorithms will use the first few points to get started, and that can change your results.

Others will give different results when rerun, even if you don't change the order of inputs. Many machine learning algorithms use random numbers - this can make the results vary slightly. It's worth doing a few runs and giving an average when that happens.

Upvotes: 1

Related Questions