Reputation: 461
I am simulating data using rnorm, but I need to set an upper and lower limit, does anyone know how to do this?
code:
rnorm(n = 10, mean = 39.74, sd = 25.09)
Upper limit needs to be 340, and the lower limit 0
I am asking this question because I am rewriting an SAS-code into an R-code. I have never used SAS. I am trying to rewrite the following piece of code:
sim_sample(simtot=100000,seed=10004,lbound=0,ubound=340,round_y=0.01,round_m=0.01,round_sd=0.01,n=15,m=39.74,sd=25.11,mk=4)
Upvotes: 24
Views: 40311
Reputation: 39657
There are several ways to set upper and lower limits to a normal distribution, what will cause that the result is no longer normal distributed.
Assuming a mean=0
, sd=1
producing N=1e5
values with a lower boundary of LO=-1
and an upper boundary of UP=2
.
N <- 1e5L
LO <- -1
UP <- 2
Move outliers to border (@Roland)
set.seed(42)
x <- pmax(LO, pmin(UP, rnorm(N)))
mean(x)
#[1] 0.07238029
median(x)
#[1] -0.002066374
sd(x)
#[1] 0.8457605
hist(x, 30)
Cut outliers of (@Dason, @Roland, truncnorm::rtruncnorm, MCMCglmm::rtnorm)
set.seed(42)
x <- qnorm(runif(N, pnorm(LO), pnorm(UP)))
mean(x)
#[1] 0.2317875
median(x)
#[1] 0.173679
sd(x)
#[1] 0.7236536
Scale (@Alex Essilfie)
set.seed(42)
x <- rnorm(N)
x <- (x-min(x))/(max(x)-min(x))*(UP-LO)+LO
mean(x)
#[1] 0.4474876
median(x)
#[1] 0.4482257
sd(x)
#[1] 0.3595199
Combination of methods. E.g. Cut and scale:
set.seed(42)
x <- qnorm(runif(N, pnorm(-3), pnorm(3)))
x <- (x-min(x))/(max(x)-min(x))*(UP-LO)+LO
mean(x)
#[1] 0.5010759
median(x)
#[1] 0.5014713
sd(x)
#[1] 0.4957751
Asymmetric combination
set.seed(42)
n <- round(N*abs(LO)/diff(range(c(LO, UP))))
x <- c(qnorm(runif(n, pnorm(-3), 0.5)), qnorm(runif(N-n, 0.5, pnorm(3))))
x <- ifelse(x < 0, x/min(x)*LO, x/max(x)*UP)
mean(x)
#[1] 0.2651627
median(x)
#[1] 0.2127903
sd(x)
#[1] 0.5078264
Upvotes: 0
Reputation: 12613
This is the function that I wrote to achieve the same purpose. It normalizes the result from the rnorm
function and then adjusts it to fit the range.
NOTE: The standard deviation and mean (if specified) get altered during the normalization process.
#' Creates a random normal distribution within the specified bounds.
#'
#' WARNING: This function does not preserve the standard deviation or mean.
#' @param n The number of values to be generated
#' @param mean The mean of the distribution
#' @param sd The standard deviation of the distribution
#' @param lower The lower limit of the distribution
#' @param upper The upper limit of the distribution
rtnorm <- function(n, mean=NA, sd=1, lower=-1, upper=1){
mean = ifelse(is.na(mean)|| mean < lower || mean > upper,
mean(c(lower, upper)), mean)
data <- rnorm(n, mean=m, sd=sd) # data
if (!is.na(lower) && !is.na(upper)){ # adjust data to specified range
drange <- range(data) # data range
irange <- range(lower, upper) # input range
data <- (data - drange[1])/(drange[2] - drange[1]) # normalize data (make it 0 to 1)
data <- (data * (irange[2] - irange[1]))+irange[1] # adjust to specified range
}
return(data)
}
Upvotes: 0
Reputation: 61933
You can make your own truncated normal sampler that doesn't require you to throw out observations quite simply
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}
Upvotes: 15
Reputation: 622
The rtruncnorm() function will return the results you need.
library(truncnorm)
rtruncnorm(n=10, a=0, b=340, mean=39.4, sd=25.09)
Upvotes: 32
Reputation: 132706
Like this?
mysamp <- function(n, m, s, lwr, upr, nnorm) {
samp <- rnorm(nnorm, m, s)
samp <- samp[samp >= lwr & samp <= upr]
if (length(samp) >= n) {
return(sample(samp, n))
}
stop(simpleError("Not enough values to sample from. Try increasing nnorm."))
}
set.seed(42)
mysamp(n=10, m=39.74, s=25.09, lwr=0, upr=340, nnorm=1000)
#[1] 58.90437 38.72318 19.64453 20.24153 39.41130 12.80199 59.88558 30.88578 19.66092 32.46025
However, the result is not normal distributed and usually won't have the mean and sd you've specified (in particular if the limits are not symmetric around the specified mean).
According to your comment it seems you want to translate this SAS function. I am not an SAS user, but this should do more or less the same:
mysamp <- function(n, m, s, lwr, upr, rounding) {
samp <- round(rnorm(n, m, s), rounding)
samp[samp < lwr] <- lwr
samp[samp > upr] <- upr
samp
}
set.seed(8)
mysamp(n=10, m=39.74, s=25.09, lwr=0, upr=340, rounding=3)
#[1] 37.618 60.826 28.111 25.920 58.207 37.033 35.467 12.434 0.000 24.857
You may then want to use replicate
to run the simulations. Or if you want faster code:
sim <- matrix(mysamp(n=10*10, m=39.74, s=25.09, lwr=0, upr=340, rounding=3), 10)
means <- colMeans(sim)
sds <- apply(sim, 2, sd)
Upvotes: 10
Reputation: 13122
Assuming you want exactly 10 numbers and not the subset of them that is >0, <340 (and night not be a normal distribution):
aa <- rnorm(n = 10, mean = 39.74, s = 25.09)
while(any(aa<0 | aa>340)) { aa <- rnorm(n = 10, mean = 39.74, s = 25.09) }
Upvotes: 0