anon
anon

Reputation: 342

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?

I have Matlab R2012a and here is my output

crossvalind('KFold',1:1:11,5)

ans =

 2
 5
 1
 3
 2
 1
 5
 3
 5
 1
 5

Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.

Upvotes: 1

Views: 1431

Answers (1)

Richante
Richante

Reputation: 4388

The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.

I tried doing:

numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))

and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.

It is possible that you mean to do:

crossvalind('KFold', 11, 5);

which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.

However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:

crossvalind('LeaveMOut', 11, 1);

although a better method would be:

for leave_out=1:11
  fold_number = (1:11) ~= leave_out;
  <code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end

Upvotes: 2

Related Questions