Reputation: 1163
So first I have sorted labels, for example, 40 rows labeled A, and then other 40 rows labeled B, 40 rows labeled C and then 40 labeled D. all in this order making a list of 160 labels.
after predicting with both labels(shuffled and unshuffled) Here are my scores:
shuffled:
0.14375
not shuffled:
0.30434782608695654
my question is: shouldn't both be the same? or this is normal and I'm not making a mistake?
Upvotes: 1
Views: 52
Reputation: 19272
There are many circumstances under which the results can change when the order of the training inputs' order is changed.
For example, The Nearest neighbours warns
Regarding the Nearest Neighbors algorithms, if two neighbors k+1 and k have identical distances but different labels, the result will depend on the ordering of the training data.
Other algorithms will use the first few points to get started, and that can change your results.
Others will give different results when rerun, even if you don't change the order of inputs. Many machine learning algorithms use random numbers - this can make the results vary slightly. It's worth doing a few runs and giving an average when that happens.
Upvotes: 1