Reputation: 21
I need to generate a random matrix of K
columns and N
rows containing ones and zeroes, such that:
a) Each row contains exactly k
ones.
b) Each row is different from the other (combinatorics imposes that if N
> nchoosek(K, k)
there will be nchoosek(K,k)
rows).
Assume I want N = 10000
(out of all the possible nchoosek(K, k) = 27405
combinations), different 1×K vectors (with K = 30
) containing k (with k = 4
) ones and K - k
zeroes.
This code:
clear all; close
N=10000; K=30; k=4;
M=randi([0 1],N,K);
plot(sum(M,2)) % condition a) not satisfied
does not satisfy neither a) nor b).
This code:
clear all; close;
N=10000;
NN=N; K=30; k=4;
tempM=zeros(NN,K);
for ii=1:NN
ttmodel=tempM(ii,:);
ttmodel(randsample(K,k,false))=1; %satisfies condition a)
tempM(ii,:)=ttmodel;
end
Check=bi2de(tempM); %from binary to decimal
[tresh1,ind,tresh2] = unique(Check);%drop the vectors that appear more than once in the matrix
M=tempM(ind,:); %and satisfies condition b)
plot(sum(M,2)) %verify that condition a) is satisfied
%Effective draws, Wanted draws, Number of possible combinations to draw from
[sum(sum(M,2)==k) N nchoosek(K,k) ]
satisfies condition a) and partially condition b). I say partially because unless NN>>N the final matrix will contain less than N
rows each different from each other.
Is there a better and faster way (that possible avoids the for cycle and the need of having NN>>N) to solve the problem?
Upvotes: 2
Views: 839
Reputation: 32930
First, generate N unique k-long permutations of the positions of ones:
cols = randperm(K, N);
cols = cols(:, 1:k);
Then generate the matching row indices:
rows = meshgrid(1:N, 1:k)';
and finally create the sparse matrix with:
A = sparse(rows, cols, 1, N, K);
To obtain the full form of the matrix, use full(A)
.
K = 10;
k = 4;
N = 5;
cols = randperm(K, N);
cols = cols(:, 1:k);
rows = meshgrid(1:N, 1:k)';
A = sparse(rows, cols , 1, N, K);
full(A)
The result I got is:
ans =
1 1 0 0 0 0 0 1 0 1
0 0 1 1 0 1 0 0 0 1
0 0 0 1 1 0 1 0 1 0
0 1 0 0 0 0 1 0 1 1
1 1 1 0 0 1 0 0 0 0
This computation should be pretty fast even for large values of K and N. For K = 30, k = 4, N = 10000 the result was obtained in less than 0.01 seconds.
Upvotes: 2
Reputation: 13171
If you have enough memory for nchoosek(K,k) integers, build an array of those, use a partial Fisher-Yates shuffle to get a proper uniformly random subset of N of those. Now, given the array of N integers, interpret each as the rank of the combination representing each row of your final array. If you use colexicographical ordering of combinations, computing the combination from a rank is pretty simple (though it uses lots of binomial combination functions, so it pays to have a fast one).
I'm not a Matlab guy, but I've done things similar to this in C. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[]
with indices from 0
to n-1
for the r
th combination of k
out of n
elements in colex order. You would interpret these as the positions of the 1
s in your row.
Upvotes: 0
Reputation: 194
You could use randperm(n) to generate random sequences of integers from 1 to n, and store the nonrepeated sequences as rows in a matrix M until size(unique(M,'rows'),1)==size(M,1). Then you could use M to index a logical matrix with the appropriate number of true values in each row.
Upvotes: 0