Learner
Learner

Reputation: 960

Sample specific rows in matlab

I have a scenario in which there is a Label matrix of dimension N x 1. The example entries in the label matrix is given below

Label = [1; 3; 5; ....... 6]

I would like to random sample 'm1' records of label1, 'm2' records of label2 etc. so that the output LabelIndicatorMatrix (N x 1 dimension) look something like

LabelIndicatorMatrix = [1; 1; 0;.....1]

1 represent record has been chosen, 0 represent record not chosen during sampling. The output matrix satisfies the following condition

Sum(LabelIndicatorMatrix) = m1+m2...m6

Upvotes: 1

Views: 218

Answers (2)

Amro
Amro

Reputation: 124563

One possible solution:

Label = randi([1 6], [100 1]);  %# random Nx1 vector of labels
m = [2 3 1 0 1 2];              %# number of records to sample from each category

LabelIndicatorMatrix = false(size(Label));   %# marks selected records
uniqL = unique(Label);                       %# unique labels: 1,2,3,4,5,6
for i=1:numel(uniqL)
    idx = find(Label == uniqL(i));           %# indices where label==k
    ord = randperm(length(idx));             %# random permutation
    ord = ord(1:min(m(i),end));              %# pick first m_k
    LabelIndicatorMatrix( idx(ord) ) = true; %# mark them as selected
end

To make sure we satisfy the requirements, we check:

>> sum(LabelIndicatorMatrix) == sum(m)
ans =
     1

Here is my attempt at a vectorized solution:

Label = randi([1 6], [100 1]);  %# random Nx1 vector of labels
m = [2 3 1 0 1 2];              %# number of records to sample from each category

%# some helper functions
firstN = @(V,n) V(1:min(n,end));                  %# first n elements from vector
pickN = @(V,n) firstN(V(randperm(length(V))), n); %# pick n elements from vector

%# randomly sample labels, and get indices
idx = bsxfun(@eq, Label, unique(Label)');   %'# idx(:,k) indicates where label==k
[r c] = find(idx);                          %# row/column indices
idx = arrayfun(@(k) pickN(r(c==k),m(k)), 1:size(idx,2), ...
               'UniformOutput',false);      %# sample m(k) from labels==k

%# mark selected records
LabelIndicatorMatrix = false(size(Label));
LabelIndicatorMatrix( vertcat(idx{:}) ) = true;

%# check results are correct
assert( sum(LabelIndicatorMatrix)==sum(m) )

Upvotes: 2

Aabaz
Aabaz

Reputation: 3116

you could start with this little sample of code, it selects random samples of your label vector and find which values of your label vector have been selected at least once:

Label = [1; 3; 5; ....... 6];
index = randi(N,m1,1);
index = unique(index);
LabelIndicatorMatrix = zeros(N,1);
LabelIndicatorMatrix(index)=1;

That said I am not sure I understand the final condition on the LabelIndicatorMatrix.

Upvotes: 1

Related Questions