Reputation: 166
Suppose I have a very large square matrix, M(i, j), such that each element in the matrix represents the probability that the element will be chosen in a weighted random selection. I need to sample n elements from the matrix (by the (i, j) indices) with replacement. The weights will change on every iteration of the main loop.
Currently, I am using something like the following:
for m = 1:M_size
xMean(m) = mean(M(:, m));
end
[~, j_list] = histc(rand(n, 1), cumsum([0; xMean'./sum(xMean)']));
for c = 1:n
[~, i_list(c)] = ...
histc(rand(1, 1), cumsum([0;, M(:, j_list(c))./sum(M(:, j_list(c)))]));
end
But this seems to be a rather clunky method, that also takes a very long time due to the for loop. Is there a more efficient method? Perhaps if I vectorize the matrix in some way?
*Edit I should mention that I do not have access to a statistics toolbox
Many thanks in advance.
Upvotes: 0
Views: 423
Reputation: 12345
I think I would actually solve this by un-vectorizing. That is, remove all of the high level calls and expensive operations and strip it down to the essentials, using only predefined arrays and simple operations.
The core of an algorithm would be:
Determine the sum of the weights
Select n random numbers between 0 and the sum of the weights, sort them.
Manually implement a cumsum loop. However, instead of storing all cumulative sums, just store the indexes where the cumsum jumps from less than the current random number to more than the current random number.
In code (with a bit of a timing rig), that looks like this:
tic
for ixTiming = 1:1000
M = abs(randn(50));
M_size = size(M, 2);
n = 8;
total = sum(M(:));
randIndexes = sort(rand(n,1) * total);
list = zeros(n,1);
ixM = 1;
ixNextList = 1;
curSum = 0;
while ixNextList<=n && ixM<numel(M)
while curSum<randIndexes(ixNextList) && ixM<=numel(M)
curSum = curSum+M(ixM);
ixM = ixM + 1;
end
list(ixNextList) = ixM;
ixNextList = ixNextList+1;
end
[i_list, j_list] = ind2sub(size(M),list);
end
toc; %0.216 sec. on my computer
Compare this to the code in the original question:
tic
for ixTiming = 1:1000
M = abs(randn(50));
M_size = size(M, 2);
n = 8;
for m = 1:M_size
xMean(m) = mean(M(:, m));
end
[~, j_list] = histc(rand(n, 1), cumsum([0; xMean'./sum(xMean)']));
for c = 1:n
[~, i_list(c)] = ...
histc(rand(1, 1), cumsum([0;, M(:, j_list(c))./sum(M(:, j_list(c)))]));
end
end
toc; %1.10 sec on my computer
Caveats and optimizations.
I have not extensively tested this. Random number operations are hard to for propery random behavior. Run a few test cases over a lot of monte carlo sets to make sure the behavior is as expected. Especially watch out for off-by-one type errors.
Profile, and then look for additional improvements in any slow steps. Some possibilities.
Maintain the total
value as you change M
, so you don;t need to recalculate.
Check the lowest and highest value of randIndexes
against 0
and total
. If randIndexes(1) is larger than
total-randIndexes(end), then increment
ixMfrom
numel(M)to
1, rather than from
1to
numel(M)`.
Upvotes: 0
Reputation: 4685
% M is ixj
xMean = transpose(mean(M,1));
%xMean is jx1, so i hope n == j
[~, j_list] = histc(rand(n, 1), cumsum([0; xMean./sum(xMean)]));
% j_list is not used? but is j x 1
cumsumvals = cumsum([zeros(1,jj);, M(:,j_list(1:n))./kron(sum(M(:,j_list(1:n))),ones(ii,1))],1),1)
% cumsumvals is i+1 x j, so looks like it should work
% but histc won't work with a matrix valued edge parameter
% you'll need to look into hist3 for that
for c = 1:n
[~, i_list(c)] = ...
histc(rand(1, 1), cumsumvals(:,c));
end
So it's closer, but you'll need hist3 to make completely vectorized.
Upvotes: 0
Reputation: 633
randsample
(docs) is your friend here. I would use the following method which converts to indexes then back to subscripts:
selected_indexes = randsample(1:numel(M), n, true, M(:));
[sub_i, sub_j] = ind2sub(size(M), selected_indexes);
You might have to do a few transposes on M
to get the appropriate dimensions.
Upvotes: 1