Reputation: 4996
Let's say I have a population like {1,2,3, ..., 23} and I want to generate a sample so that the sample's mean equals 6.
I tried to use the sample
function, using a custom probability vector, but it didn't work:
population <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)
mean(population)
minimum <- min(population)
maximum <- max(population)
amplitude <- maximum - minimum
expected <- 6
n <- length(population)
prob.vector = rep(expected, each=n)
for(i in seq(1, n)) {
if(expected > population[i]) {
prob.vector[i] <- (i - minimum) / (expected - minimum)
} else {
prob.vector[i] <- (maximum - i) / (maximum - expected)
}
}
sample.size <- 5
sample <- sample(population, sample.size, prob = prob.vector)
mean(sample)
The mean of the sample is about the mean of the population (oscillates around 12), and I wanted it to be around 6.
A good sample would be:
The problem is different from sample integer values in R with specific mean because I have a specific population and I can't just generate arbitrary real numbers, they must be inside the population.
The plot of the probability vector:
Upvotes: 3
Views: 759
Reputation: 79188
You can try this:
m = local({b=combn(1:23,5);
d = colMeans(b);
e = b[,d>5.5 &d<6.5];
function()sample(e[,sample(ncol(e),1)])})
m()
[1] 8 5 6 9 3
m()
[1] 6 4 5 3 13
breakdown:
b=combn(1:23,5) # combine the numbers into 5
d = colMeans(b) # find all the means
e = b[,d>5.5 &d<6.5] # select only the means that are within a 0.5 range of 6
sample(e[,sample(ncol(e),1)]) # sample the values the you need
Upvotes: 2