Reputation: 2034
I am aware of MATLAB's datasample
which allows to select k
times from a certain population
. Suppose population=[1,2,3,4]
and I want to uniformly sample, with replacement, k=5
times from it. Then:
datasample(population,k)
ans =
1 3 2 4 1
Now, I want to repeat the above experiment N=10000
times without using a for loop. I tried doing:
datasample(repmat(population,N,1),5,2)
But the output I get is (just a short excerpt below):
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
Every row (result of an experiment) is the same! But obviously they should be different... It's as though some random seed is not updating between rows. How can I fix this? Or some other method I could use that avoids a for loop? Thanks!
Upvotes: 3
Views: 1250
Reputation: 104535
You seem to be confusing the way datasample
works. If you read the documentation on the function, if you specify a matrix, it will generate a data sampling from a selection of rows in the matrix. Therefore, if you simply repeat the population
vector 10000 times, and when you specify the second parameter of the function - which in this case is how many rows of the matrix to extract, even though the actual row locations themselves are different, the actual rows over all of the matrix is going to be the same which is why you are getting that "error".
As such, I wouldn't use datasample
here if it is your intention to avoid looping. You can use datasample
, but you'd have to loop over each call and you explicitly said that this is not what you want.
What I would recommend you do is first create your population
vector to have whatever you desire in it, then generate a random index matrix where each value is between 1 up to as many elements as there are in population
. This matrix is in such a way where the number of columns is the number of samples and the number of rows is the number of trials. Once you create this matrix, simply use this to index into your vector to achieve the desired sampling matrix. To generate this random index matrix, randi
is a fine choice.
Something like this comes to mind:
N = 10000; %// Number of trials
M = 5; %// Number of samples per trial
population = 1:4; %// Population vector
%// Generate random indices
ind = randi(numel(population), N, M);
%// Get the stuff
out = population(ind);
Here's the first 10 rows of the output:
>> out(1:10,:)
ans =
4 3 1 4 2
4 4 1 3 4
3 2 2 2 3
1 4 2 2 2
1 2 3 4 2
2 2 3 2 1
4 1 3 2 4
1 4 1 3 1
1 1 2 4 4
1 2 4 2 1
I think the above does what you want. Also keep in mind that the above code generalizes to any population vector you want. You simply have to change the vector and it will work as advertised.
Upvotes: 5
Reputation: 6187
Ok so both of the current answers both say don't use datasample
and use randi
instead. However, I have a solution for you with datasample
and arrayfun
.
>> population = [1 2 3 4];
>> k = 5; % Number of samples
>> n = 1000; % Number of times to execute datasample(population, k)
>> s = arrayfun(@(k) datasample(population, k), n*ones(k, 1), 'UniformOutput', false);
>> s = cell2mat(s);
s =
1 4 1 4 4
4 1 2 2 4
2 4 1 2 1
1 4 3 3 1
4 3 2 3 2
We need to make sure to use 'UniformOutput', false
with arrayfun
as there is more than one output. The cell2mat
call is needed as the result of arrayfun
is a cell array.
Upvotes: 3
Reputation: 36710
datasample
interprets each column of your data as one element of your population, sampling among all columns.
To fix this you could call datasample
N times in a loop, instead I would use randi
population(randi(numel(population),N,5))
assuming your population is always 1:p, you could simplify to:
randi(p,N,5)
Upvotes: 4