Napoleon
Napoleon

Reputation: 13

Ordering of training dataset in weka

I am new in weka and I am currently running some classification algorithms on a created dataset.

The dataset contains a class {player1,player2,player3} and its samples are sorted by player's sequence.

For example:

2,748.564,384.103,1.389,0.395,2354.950,0,1858.400,0.353,5,Player_1 1,729.143,391.086,1.479,0.378,2677.350,0,1496.900,0.333,3,Player_1 2,719.765,391.824,1.295,0.469,2659.625,0,1889.429,0.250,2,Player_1 1,726.515,388.121,1.506,0.360,2236.200,0,1431.800,0.364,4,Player_2 2,733.667,387.000,1.241,0.405,2612.450,0,2322.400,0.444,5,Player_2 1,744.343,380.000,1.516,0.366,2461.500,0,1455.050,0.417,3,Player_2 2,729.500,387.167,1.336,0.422,2150.167,0,2092.000,0.429,1,Player_3 1,734.100,398.700,1.522,0.311,2403.500,0,1497.550,0.214,3,Player_3

I figured out that if I randomly change this order,

for example: 1,734.100,398.700,1.522,0.311,2403.500,0,1497.550,0.214,3,Player_3 2,748.564,384.103,1.389,0.395,2354.950,0,1858.400,0.353,5,Player_1 1,726.515,388.121,1.506,0.360,2236.200,0,1431.800,0.364,4,Player_2 2,733.667,387.000,1.241,0.405,2612.450,0,2322.400,0.444,5,Player_2 2,742.300,394.600,1.514,0.388,2530.833,0,1454.000,1.000,1,Player_3 .....

it will usually affect the classifiers' performance. Can someone explain me why this happens? I used NaiveBayes, RandomForest and LMT as classifiers.

Thanks in advance, Napoleon

Upvotes: 1

Views: 657

Answers (1)

Matthew Spencer
Matthew Spencer

Reputation: 2295

Changing the CV Folds parameter, the CV Random Seed or the order of the data will affect the accuracy of your classifiers.

Before training your classifiers, your data is randomly allocated to the training or testing set according to the cross-validation algorithm. For this reason, changing the CV Folds will give more or less data for training, causing a different result. Changing the seed will give a different allocation of data each time a different value is supplied. Likewise, if you reorder the data and keep the seed, the same row indexes would be used for training, but the data would be in a different order, thus causing different results.

Upvotes: 1

Related Questions