Reputation: 163
I am trying to implement this code so it works as quickly as possible.
Say I have a population of 100 different values, you can think of it as pop = 1:100
or pop = randn(1,100)
to keep things simple. I have a vector n
which gives me the size of samples I want to get. Say, for example, that n=[1 3 10 6 2]
. What I want to do is to take 5 (which in reality is length(n)
) different samples of pop
, each consisting of n(i)
elements without replacement. This means that for my first sample I want 1 element out of pop
, for the second sample I want 3, for the third I want 10, and so on.
To be honest, I am not really interested in which elements are sampled. What I want to get is the sum of those elements that are present in the ith-sample. This would be trivial if I implemented it with a loop, but I am trying to avoid using them to keep my code as quick as possible. I have to do this for many different populations and with length(n)
being very large.
If I had to do it with a loop, this would be how:
pop = randn(1,100);
n = [1 3 10 6 2];
sum_sample = zeros(length(n),1);
for i = 1:length(n)
sum_sample(i,1) = sum(randsample(pop,n(i)));
end
Is there a way to do this?
Upvotes: 2
Views: 148
Reputation: 21563
The only way to figure out what is fastest for you is to do a comparison of the different methods.
In fact the loop appears to be very fast in this case!
pop = randn(1,100);
n = [1 3 10 6 2];
tic
sr = @(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);
toc %% Returns about 0.004
clear su
tic
for t=numel(n):-1:1
su(t)=sum(randsample(pop,n(t)));
end
toc %% Returns about 0.003
Upvotes: 1
Reputation: 6414
You can do something like this:
pop = randn(1,100);
n = [1 3 10 6 2];
sampled_data_index = randi(length(pop),1,sum(n));
sampled_data = pop(sampled_data_index);
The randi
function randomly selects integer values in a specified range that is suitable for indexing. After you have the indices you can use those at once to sample the data from the pop
database.
If you want to have unique indices you can replace the randi
function with randperm
:
sampled_data_index = randperm(length(pop),sum(n));
Finally:
You can have all the sampled values as a cell variable using the following code:
pop = randn(1,100);
n = [1 3 10 6 2];
fun = @(m) pop(randperm(length(pop),m));
C = arrayfun(fun,n,'UniformOutput',0)
Also having the sum of the sampled data:
funs = @(m) sum(pop(randperm(length(pop),m)));
sumC = arrayfun(funs,n)
Upvotes: 0
Reputation: 14316
You can create a function handle which choses the random samples and sums these up. Then you can use arrayfun to execute this function for all values of n:
pop = randn(1,100);
n = [1 3 10 6 2];
sr = @(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);
Upvotes: 0