Reputation: 199
I am just trying to split the sample into groups with the fixed group size based on the designated probabilities by using R, but would like to always ensure that the group size for shuffled sample is always the same. For example, let's assume that the sample size is 100, the number of groups is 4, and for each group, the group size is 40, 30, 20, 10, respectively, as shown below:
category_split <- sample(1:4, 100, replace=T, prob=c(0.4,0.3,0.2,0.1))
category_split
# [1] 1 2 3 3 1 1 3 3 2 1 1 2 1 4 2 1 3 2 1 1 1 2 3 4 1 2 2 1 2 2 1 1 1 3 3 4 3 1 2 2 2 3 1 1 3 2 3 1 1 1 4 1 4 1
#[55] 1 2 3 4 1 1 1 1 2 1 3 2 2 3 1 3 3 2 1 4 1 2 1 2 3 2 3 3 1 2 1 2 3 1 1 1 1 1 3 2 3 1 1 1 2 3
table(category_split)
#category_split
# 1 2 3 4
#43 26 24 7
But, with the probabilistic nature of the sampling process, the results could not always ensure the exactly one with the same designated group size as stipulated (40, 30, 20, 10), although the results are approximately similar. Is there any way that I can get the random shuffling results with the same group size by using sample
function or any other functions in R?
Upvotes: 3
Views: 34
Reputation: 32558
First create a vector with the necessary number of elements and then sample
category_split = sample(rep(1:4, c(40, 30, 20, 10)))
table(category_split)
#category_split
# 1 2 3 4
#40 30 20 10
Upvotes: 4