R Generate Bounded Random Sample Arround Specific Mean

Question

I've been stuck with this for a while, so I decided to write a question.

Problem: How to generate a random sample (of lenght n) with a lower/upper bound and arround a specific mean.

Observation: distribution doesn't need to be specific (it could be normal, beta, etc).

Aproaches considered:

One aproach is to use the rtnorm function (package msm) which generates a random numbers with a normal distribution within specified bounds but it doesn't hold your wanted mean value.

A second aproach I've tried is this function which I found in a question I can't find anymore

rBootstrap <- function(n, mean, sd, lowerBound, upperBound){
  range <- upperBound - lowerBound
  m <- (mean-lowerBound) / range #mapping mean to 0-1 range
  s <- sd / range #mapping sd to 0-1 range
  a <- (m^2 - m^3 - m*s^2)/s^2 #calculating alpha for rbeta 
  b <- (m-2*m^2+m^3-s^2+m*s^2)/s^2 #calculating beta for rbeta
  data <- rbeta(n,a,b)  #generating data
  data <- lowerBound + data * range #remaping to given bounds
  return(data)
}

this function actually gives great results unless: upperBound > lowerBound + (2* mean - lowerBound) (upper bound exceeds two times the distance from the lowerBound to the mean).

Particularly, I would like to generate a random sample of lenght 1,800, with values between 50,000 and 250,000 with mean value = 70,000.

Severin Pappadeux · Accepted Answer

You should use truncated normal distribution, but mean should be recalibrated. If you look at mean in rtnorm, it is clearly stated: mean is the mean of the original Normal distribution before truncation.

If you want OBSERVABLE mean to be equal to desired value, just use formula from Truncated Normal:

mu = E + sigma*(f(b) - f(a))/(F(b) - F(a))

Here E is what mean value you want to have (70,000 in your case), f(x) being gaussian density, F(x) being cumulative function, a and b being interval boundaries (centered and scaled).

a = (LB - mu)/sigma
b = (RB - mu)/sigma

After you computed mu, pass it down to rtnorm as mean parameter.

NB: you might want to do similar exercise with sigma - what's going into rtnorm is NOT what you're going to observe in sampling, see again wiki reference

UPDATE

Ok, got to the code myself, though first cut is done in Python (looking into R) right now. Problem is, for given observable mean mu is in f(a), in f(b), in F(a) and in F(b) which converts the problem into search of the root of the non-linear equation. But it is solvable, please check the code. Note, it follows pretty much wiki notation.

For example for your parameters and sigma=12,000, I got

Found mu = 68430.372119287 for the desired mean 70000.0 and sigma 12000.0
Sampled 100000 truncated gaussians and got observed mean = 70023.15990337673

For your parameters and sigma=24,000, I got

Found mu = 52275.475000378945 for the desired mean 70000.0 and sigma 24000.0
Sampled 100000 truncated gaussians and got observed mean = 69922.16000288539

So mu is getting pretty close to the left boundary for large sigma, which is expected behavior, but observed mean stays close to 70,000, which is what you want.

UPDATE II

Here is R code, in github repo as well

require(rootSolve)
require(msm)

phi <- function(z) {
    dnorm(z)
}

Phi <- function(z) {
    pnorm(z)
}

Mean <- function(mu, sigma, a, b) {
    alfa <-  (a - mu) / sigma
    beta <-  (b - mu) / sigma

    Z <-  Phi(beta) - Phi(alfa)

    mu + sigma*(phi(alfa) - phi(beta))/Z
}

f <- function(mu, mean, sigma, a, b) {
    mean - Mean(mu, sigma, a, b)
}

a <-  50000.0
b <-  250000.0
mean  <- 70000.0
sigma <- 24000.0

# find mu for desired mean
q <- uniroot(f, c(a, b), mean, sigma, a, b)
mu <- q$root

print(sprintf("Found mu = %f for the desired mean %f and sigma %f", mu, mean, sigma))

# sampling test
set.seed(32345)
N = 100000
r <- rtnorm(N, mean=mu, sd=sigma, lower=a, upper=b)

print(sprintf("Sampled %d truncated gaussians and got observed mean = %f", N, mean(r)))

R Generate Bounded Random Sample Arround Specific Mean

Answers (1)

Related Questions