Reputation: 65

Simulate data from (non-standard) density function

I want to simulate data from a non-standard density function. I already found the following link (How do I best simulate an arbitrary univariate random variate using its probability function?). However, this gives weird results. Somehow, this cumulative density function ( cdf() ) does not work well. From some values, it gives very strange results. For example, take a look at the following code:

density=function(x)(25*200.7341^25/x^26*exp(-(200.7341/x)^25))
cdf<-function(x) integrate(density,1,x)[[1]]

cdf(9701)
[1] 1

cdf(9702)
[1] 6.33897e-05

So my question, how can I create a "good" CDF function? Or more directly, how can I simulate data from a PDF?

Upvotes: 4

Answers (3)

dickoa

Reputation: 18437

As pointed out by @pjs we can use Rejection sampling (check the wiki for details).

Here is one implementation of this approach.

The most important step is to find a distribution g from which we can sample and from which it exists M such that M * g > f for all point

f <- function(x) (25 * 200.7341^25 / x^26 * exp(-(200.7341/x)^25))
g <- function(x) dnorm(x, mean = 200.7341, sd = 40)
M <- 5
curve(f, 0, 500)
curve(M * g(x), 0, 500, add = TRUE, lty = "dashed")

enter image description here

Now, we can execute the algorithm

set.seed(42)
k <- 1
count <- 0
res <- vector(mode = "numeric", length = 1000)
while(k < 1001) {
          z <- rnorm(n = 1, mean = 200.7341, sd = 40)
          R <- f(z) / (M * g(z))
          if (R > runif(1)) {
              res[k] <- z
              k  <- k + 1
          }
          count <- count + 1
    }

(accept_rate <- (k / count) * 100)
## [1] 19.7086

require(MASS) ## for truehist
truehist(res)
curve(f, 0, 250, add = TRUE)

The acceptance rate is not great. You can try do find a better envelope function or use a Metropolis Hasting algorithm.

Upvotes: 5

pjs

Reputation: 19855

When I played around with your CDF it quickly emerged that most of the action is for x between 180 and 350, which I confirmed by plotting the density over that range.

I'm pretty sure that the results at x = 9702 reflect numerical instability of the computations when you have 25th and 26th powers involved. If you don't trust your CDF or it's not invertible, another pdf-based option is acceptance/rejection. You should be able to use a simple triangle with min = 180, max around 300, and mode around 200 as a bounding function g(x) and follow the algorithm described on Wikipedia to get pretty good results.

In general, if inversion doesn't work for an arbitrary distribution your other choices are 1) acceptance/rejection based on pdf relative to a bounding function, 2) composition (can you deconstruct your distribution into easier-to-generate components and select an appropriate component using conditional probability), or 3) "special tricks" - are there cases where convolution or parameterization gives distributional equivalence (e.g., N(0,1)^2 = chi-square(1), chi-square(k) = sum of k independent chi-square(1)'s, exp(2) = chi-square(2), etc...). See Luc Devroye's book on Non-Uniform Random Variate generation for a comprehensive treatment of your options.

Upvotes: 0

Vincent Zoonekynd

Reputation: 32401

If the integration interval is very large, the peak of the density is very difficult to find: integrate can easily miss it, and think that the function you are integrating is (almost) zero everywhere.

If you know where the peak is, you can cut the integral into three: around the peak, before, and after.

# Density
A <- 200.7341
f <- function(x) 25*A^25 / x^26 * exp( -(A/x)^25 )
a <- 150
b <- 400

# Numeric integration
F1 <- function(x) {
  if( x < a )      integrate(f, 1, x)[[1]] 
  else if( x < b ) integrate(f, 1, a)[[1]] + integrate(f, a, x)[[1]] 
  else             integrate(f, 1, a)[[1]] + integrate(f, a, b)[[1]] + integrate(f, b, x)[[1]] 
}

# Compare with the actual values
F2 <- function(x) exp( -(A/x)^25 )
F1(200); F2(200)
F1(1e4); F2(1e4)
F1(1e5); F2(1e5) # Imprecise if b is too low...

After checking that your interval is sufficiently large, you can remove the "before" and "after" intervals: their contribution is zero.

F1 <- function(x) {
  if( x < a )      0
  else if( x < b ) integrate(f, a, x)[[1]] 
  else             1
}

Upvotes: 4

Simulate data from (non-standard) density function

Answers (3)

Related Questions