Reputation: 83
I am using Weka to analyze data that gives a binary outcome. I initially use a 10-fold cross validation and use 66% of the dataset for training data. The accuracy that I get with this is 77.1% (correctly classified instances). I then try to see what happens when I use an 80% split instead of 66%, but with the same cross validation. The accuracy I get is only marginally better, at 77.25%. And what is worse is that when I use a 20-fold cross validation, then a 50-fold cross validation, absolutely NO improvement is obtained. I thought the whole idea of using higher cross validation is to improve the accuracy! And when I use a 90% split with a 10 fold or even a 20 fold, the accuracy drops to 74%. Can someone please tell me why my accuracy is NOT improving drastically when I use a larger split, and does not improve AT ALL when I use a high cross validation?
Upvotes: 0
Views: 2294
Reputation: 397
I then try to see what happens when I use an 80% split instead of 66%, but with the same cross validation.
I think you get it wrong. Percentage split and cross-validation are two different options for error estimation. So you can split your data according to percentage and in this case algorithms is learned on train set portion and evaluated on test set portion.
And about performance improvement, larger test set is not guaranty of performance improvement. Maybe your 66% is enough for classifier to learn all what it can.
And about cross validation 10 folds is considered optimal in most cases, and more folds will very unlikely rise up your accuracy. You should google more about cross-validation.
And final remark, if your accuracy is not satisfactory, you probably should test another classifier, or play with your classifier parameters.
Upvotes: 0