Amro
Amro

Reputation: 124563

Splitting data into training/testing datasets in MATLAB?

Upon some research I found two functions in MATLAB to do the task:

Now I've used the cvpartition to create n-fold cross validation subsets before, along with the Dataset/Nominal classes from the Statistics toolbox. So I'm just wondering what are the differences between the two and the pros/cons of each?

Upvotes: 9

Views: 18254

Answers (4)

Dave
Dave

Reputation:

Expanding on @Mr Fooz's answer

They look to be pretty similar based on the official docs of cvpartition and crossvalind, but crossvalind looks slightly more flexible (it allows for leave M out for arbitrary M, whereas cvpartition only allows for leave 1 out).

... isn't it true that you can always simulate a leave-M-out using kfold cross validation with an appropriate k value (split data into k fold, test on one, train on all others, and do this for all folds and take average) since leave-one-out is a special case of kfold where k=number of observations?

Upvotes: 3

Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96274

Amro, this is not directly an answer to your cvpartition vs crossvalind question, but there is a contribution at the Mathworks File Exchange called MulticlassGentleAdaboosting by user Sebastian Paris that includes a nice set of functions for enumerating array indices for computing training, testing and validation sets for the following sampling and cross-validation strategies:

  • Hold out
  • Bootstrap
  • K Cross-validation
  • Leave One Out
  • Stratified Cross Validation
  • Balanced Stratified Cross Validation
  • Stratified Hold out
  • Stratified Boot Strap

For details, see the demo files included in the package, and more specifically the functions sampling.m and sampling_set.m.

Upvotes: 3

Aman
Aman

Reputation: 153

I know your question is not directly referring to the Neural network toolbox, but perhaps someone else might find this useful. To get your ANN input data seperated in to test/validation/train data, use the 'net.divideFcn' variable.

net.divideFcn = 'divideind';

net.divideParam.trainInd=1:94;  % The first 94 inputs are for training.
net.divideParam.valInd=1:94;    % The first 94 inputs are for validation.
net.divideParam.testInd=95:100; % The last 5 inputs are for testing the network.

Upvotes: 1

Mr Fooz
Mr Fooz

Reputation: 111856

They look to be pretty similar based on the official docs of cvpartition and crossvalind, but crossvalind looks slightly more flexible (it allows for leave M out for arbitrary M, whereas cvpartition only allows for leave 1 out).

Upvotes: 1

Related Questions