Antonio Gargano
Antonio Gargano

Reputation: 21

Random binary matrix with two non-trivial constraints

I need to generate a random matrix of K columns and N rows containing ones and zeroes, such that:

a) Each row contains exactly k ones.
b) Each row is different from the other (combinatorics imposes that if N > nchoosek(K, k) there will be nchoosek(K,k) rows).

Assume I want N = 10000 (out of all the possible nchoosek(K, k) = 27405 combinations), different 1×K vectors (with K = 30) containing k (with k = 4) ones and K - k zeroes.

This code:

clear all; close
N=10000; K=30; k=4;
M=randi([0 1],N,K);
plot(sum(M,2)) % condition a) not satisfied

does not satisfy neither a) nor b).

This code:

clear all; close;
N=10000;
NN=N;  K=30; k=4;
tempM=zeros(NN,K);   
for ii=1:NN
ttmodel=tempM(ii,:);
ttmodel(randsample(K,k,false))=1;  %satisfies condition a)
tempM(ii,:)=ttmodel;
end
Check=bi2de(tempM);                    %from binary to decimal
[tresh1,ind,tresh2] = unique(Check);%drop the vectors that appear more than once in the   matrix
M=tempM(ind,:);                             %and satisfies condition b)
plot(sum(M,2))                                  %verify that condition a) is satisfied
%Effective draws, Wanted draws, Number of possible combinations to draw from
[sum(sum(M,2)==k) N nchoosek(K,k) ]  

satisfies condition a) and partially condition b). I say partially because unless NN>>N the final matrix will contain less than N rows each different from each other.

Is there a better and faster way (that possible avoids the for cycle and the need of having NN>>N) to solve the problem?

Upvotes: 2

Views: 839

Answers (3)

Eitan T
Eitan T

Reputation: 32930

First, generate N unique k-long permutations of the positions of ones:

cols = randperm(K, N);
cols = cols(:, 1:k);

Then generate the matching row indices:

rows = meshgrid(1:N, 1:k)';

and finally create the sparse matrix with:

A = sparse(rows, cols, 1, N, K);

To obtain the full form of the matrix, use full(A).

Example

K = 10;
k = 4;
N = 5;

cols = randperm(K, N);
cols = cols(:, 1:k);
rows = meshgrid(1:N, 1:k)';
A = sparse(rows, cols , 1, N, K);
full(A)

The result I got is:

ans = 
    1   1   0   0   0   0   0   1   0   1
    0   0   1   1   0   1   0   0   0   1
    0   0   0   1   1   0   1   0   1   0
    0   1   0   0   0   0   1   0   1   1
    1   1   1   0   0   1   0   0   0   0

This computation should be pretty fast even for large values of K and N. For K = 30, k = 4, N = 10000 the result was obtained in less than 0.01 seconds.

Upvotes: 2

Lee Daniel Crocker
Lee Daniel Crocker

Reputation: 13171

If you have enough memory for nchoosek(K,k) integers, build an array of those, use a partial Fisher-Yates shuffle to get a proper uniformly random subset of N of those. Now, given the array of N integers, interpret each as the rank of the combination representing each row of your final array. If you use colexicographical ordering of combinations, computing the combination from a rank is pretty simple (though it uses lots of binomial combination functions, so it pays to have a fast one).

I'm not a Matlab guy, but I've done things similar to this in C. This code, for example:

for (i = k; i >= 1; --i) {
    while ((b = binomial(n, i)) > r) --n;
    buf[i-1] = n;
    r -= b;
}

will fill the array buf[] with indices from 0 to n-1 for the rth combination of k out of n elements in colex order. You would interpret these as the positions of the 1s in your row.

Upvotes: 0

Nathan Henkel
Nathan Henkel

Reputation: 194

You could use randperm(n) to generate random sequences of integers from 1 to n, and store the nonrepeated sequences as rows in a matrix M until size(unique(M,'rows'),1)==size(M,1). Then you could use M to index a logical matrix with the appropriate number of true values in each row.

Upvotes: 0

Related Questions