Reputation: 960
I have a scenario in which there is a Label matrix of dimension N x 1. The example entries in the label matrix is given below
Label = [1; 3; 5; ....... 6]
I would like to random sample 'm1' records of label1, 'm2' records of label2 etc. so that the output LabelIndicatorMatrix (N x 1 dimension) look something like
LabelIndicatorMatrix = [1; 1; 0;.....1]
1 represent record has been chosen, 0 represent record not chosen during sampling. The output matrix satisfies the following condition
Sum(LabelIndicatorMatrix) = m1+m2...m6
Upvotes: 1
Views: 218
Reputation: 124563
One possible solution:
Label = randi([1 6], [100 1]); %# random Nx1 vector of labels
m = [2 3 1 0 1 2]; %# number of records to sample from each category
LabelIndicatorMatrix = false(size(Label)); %# marks selected records
uniqL = unique(Label); %# unique labels: 1,2,3,4,5,6
for i=1:numel(uniqL)
idx = find(Label == uniqL(i)); %# indices where label==k
ord = randperm(length(idx)); %# random permutation
ord = ord(1:min(m(i),end)); %# pick first m_k
LabelIndicatorMatrix( idx(ord) ) = true; %# mark them as selected
end
To make sure we satisfy the requirements, we check:
>> sum(LabelIndicatorMatrix) == sum(m)
ans =
1
Here is my attempt at a vectorized solution:
Label = randi([1 6], [100 1]); %# random Nx1 vector of labels
m = [2 3 1 0 1 2]; %# number of records to sample from each category
%# some helper functions
firstN = @(V,n) V(1:min(n,end)); %# first n elements from vector
pickN = @(V,n) firstN(V(randperm(length(V))), n); %# pick n elements from vector
%# randomly sample labels, and get indices
idx = bsxfun(@eq, Label, unique(Label)'); %'# idx(:,k) indicates where label==k
[r c] = find(idx); %# row/column indices
idx = arrayfun(@(k) pickN(r(c==k),m(k)), 1:size(idx,2), ...
'UniformOutput',false); %# sample m(k) from labels==k
%# mark selected records
LabelIndicatorMatrix = false(size(Label));
LabelIndicatorMatrix( vertcat(idx{:}) ) = true;
%# check results are correct
assert( sum(LabelIndicatorMatrix)==sum(m) )
Upvotes: 2
Reputation: 3116
you could start with this little sample of code, it selects random samples of your label vector and find which values of your label vector have been selected at least once:
Label = [1; 3; 5; ....... 6];
index = randi(N,m1,1);
index = unique(index);
LabelIndicatorMatrix = zeros(N,1);
LabelIndicatorMatrix(index)=1;
That said I am not sure I understand the final condition on the LabelIndicatorMatrix.
Upvotes: 1