Reputation: 1440
I've prepared a vector by sampling a log-normal distribution by (by trial and error) setting the parameters for mean
and sd
so rlnorm()
returns exactly a mean of 20 and a sd of 6 (to 3 decimal places) for the any specified random set.seed()
as per the following example...
# 10,000 samples from log-normal distribution
set.seed(7)
HcT <- rlnorm(n = 10000, log(19.147), log(1.33832))
# Report mean and sd
paste('The mean of HcT is',round(mean(HcT),3),'and the SD is',round(sd(HcT),3))
[1] "The mean of HcT is 20 and the SD is 6"
However, rather than trial and error I would like to 'goal seek' the two parameters. There are several stack overflow examples for single value goal seek but I'm not sure what function or package to apply for a two parameter case (mean and SD in the above sample).
Upvotes: 4
Views: 2612
Reputation: 11255
I ran into this problem before, too, and the link below set me straight. The rlnorm()
isn't just simply using the log of the arithmatic mean and standard deviation. Instead, the function expects mu and sigma which are specific to the lognormal distribution.
Thankfully, the people at this link derived the formulas for us to transform to lognormal distributions.
I'm going to make this less pretty so people go to the link above as they solved this:
m <- 20
s <- 6
data_set <- rlnorm(n=1000000,
meanlog=log(m^2 / sqrt(s^2 + m^2)),
sdlog=sqrt(log(1 + (s^2 / m^2))))
mean(data_set)
sd(data_set)
Edit: changed variable from sd
to s
because sd()
is also a function...
Upvotes: 3
Reputation: 226057
It should work OK to minimize the sum of the squared deviations from the target values. There are pitfalls to this approach (see e.g. Numerical Recipes by Press et al.), but it should be OK for simple problems. The following code appears to retrieve the correct answers for your case:
f <- function(p,seed=7,target=c(20,6)) {
mu <- log(p[1])
sd <- log(p[2])
set.seed(seed)
r <- rlnorm(1e4,mu,sd)
sum((c(mean(r),sd(r))-target)^2)
}
Choosing some non-ridiculous starting values ({15,2}):
optim(par=c(15,2), fn=f)
Based on @Cole's answer I would have thought this would work perfectly: draw normal deviates, transform them so they have a mean and sd exactly equal to the log-scale values, then exponentiate. But it only works on average or asymptotically (i.e., a large sample converges to the desired mean), not exactly for finite samples. Haven't thought through exactly why this is so.
rlnorm_exact <- function(n, m, sd) {
m2 <- log(m^2 / sqrt(sd^2 + m^2))
sd2 <- sqrt(log(1 + (sd^2 / m^2)))
r <- c(scale(rnorm(n)))
return(exp(sd2*r+m2))
}
Upvotes: 4