Reputation: 233
I have a sparse logical matrix, which is quite large. I would like to draw random non-zero elements from it without storing all of its non-zero elements in a separate vector (eg. by using find command). Is there an easy way to do this?
Currently I am implementing rejection sampling, which is drawing a random element and checking whether that is non-zero or not. But it is not efficient when the ratio of non-zero elements is small.
Upvotes: 4
Views: 1646
Reputation: 20560
By representing the entries in a 3 column format, aka a coordinate list (i, j, value), you can simply select the items from the list. To get this, you can either use your original method for creating the sparse matrix (i.e. the precursor to sparse()
), or use the find
command, a la [i,j,s] = find(S);
If you don't need the entries, and it seems you don't, you can just extract i
and j
.
If, for some reason, your matrix is massive and your RAM limitations are severe, you can simply divide the matrix into regions, and let the probability of selecting a given sub-matrix be proportional to the number of non-zero elements (using nnz
) in that sub-matrix. You could go so far as to divide the matrix into individual columns, and the rest of the calculation is trivial. NB: by applying sum
to the matrix, you can get the per-column counts (assuming your entries are just 1s).
In this way, you need not even bother with rejection sampling (which seems pointless to me in this case, since Matlab knows where all of the non-zero entries are).
Upvotes: 0
Reputation: 446
An n x m matrix with nnz non-zero elements requires nnz + n + 1 integers to store the locations of its non-zero entries. For a logical matrix there is no need to store the value of the non-zero entries: these are all true. Correspondingly, you would do best to convert your logical sparse matrix into a list of the linear indices of its non-zero entries, together with n and m, which requires only nnz + 2 integers of storage. From these (and ind2sub) you can readily reconstruct the subscripts corresponding to any non-zero entry that you choose randomly using randi over the range 1..nnz
Upvotes: 1
Reputation: 74940
A sparse logical matrix is not a very practical representation of your data if you want to pick random locations. Rejection sampling and find
are the only two ways that make sense to me. Here's how you can do them efficiently (assuming you want to get 4 random locations):
%# using find
idx = find(S);
%# draw 4 without replacement
fourRandomIdx = idx(randperm(length(idx),4));
%# draw 4 with replacement
fourRandomIdx = idx(randi(1,length(idx),4));
%# get row, column values
[row,col] = ind2sub(size(S),fourRandomIdx);
%# using rejection sampling
density = nnz(S)/prod(size(S));
%# estimate how many samples you need to get at least 4 hits
%# and multiply by 2 (or 3)
n = ceil( 1 / (1-(1-density)^4) ) * 2;
%# random indices w/ replacement
randIdx = randi(1,n,prod(size(S)));
%# identify the first four non-zero elements
[row,col] = find(S(randIdx),4,'first');
Upvotes: 1
Reputation: 27028
find is the standard interface to get the non-zero elements in a sparse matrix. Have a look here http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040
[i,j,s] = find(S)
find returns the row indices of nonzero values in vector i, the column indices in vector j, and the nonzero values themselves in the vector s.
No need to get s. Just pick a random index in i,j.
Upvotes: 1