Jessica Jin
Jessica Jin

Reputation: 205

Issue abour integrate() function in R

I have one question about integrate() function in R. When I was using intergrate function to find the posterior mean, the result was very small (smaller than the absolute error). If I change the upper limit from Inf to 0.2 or a smaller number, the result number makes more sense. Why is the integrate function not working properly for larger upper limit?

Thanks so much!

The following is my code:

dat = read.table("c://AAPL_logret.txt")$V1
mu = 0.005
theta = 200

posterior = function (sigma2, dat)
{
  numerator = function (sigma2, dat)
  {
    out = dexp(sigma2,theta)
    for (i in 1:length(dat))
    {
     out = out * dnorm(dat[i],mu, sqrt(sigma2))
    }
    return(out)   

  }

  denominator = integrate(numerator,lower = 0, upper = Inf, dat=dat)$value

  return (numerator(sigma2,dat)/denominator)
}

curve(posterior(x,dat), from =0, to = 0.006,xlab=expression(sigma^2), 
      ylab = "Density", cex.axis = 1.3, cex.lab=1.3, col=4,lwd=1.5,lty=4,n=20000)

posterior_times_sigma2 = function(sigma2,dat)
{
  return(sigma2 * posterior(sigma2,dat))
}

posteriormean = integrate(posterior_times_sigma2, lower =0, upper= Inf, dat=dat)$value

Upvotes: 2

Views: 228

Answers (1)

renato vitolo
renato vitolo

Reputation: 1754

You have not posted the content of the "AAPL_logret.txt" file, so I generated bogus data which might look like that, e.g.:

dat <- rlnorm(200, -4, 1)

With this data, your curve() plot shows that most of the probability mass of the posterior distribution is contained between 0 and 0.006:

 1-integrate(posterior, lower=0, upper=0.006, dat=dat)$value

is about 4e-11, therefore the posterior is numerically undistinguishable from zero beyond 0.006. The R help page of integrate() explains what goes wrong here:

"Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."

I believe that there is no general solution here: in any given situation you need to explore different options. In your case, one possibility is to restrict the integration interval:

integrate(posterior_times_sigma2, lower=0, upper=Inf, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.2, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.1, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.006, dat=dat)$value

Different integration methods might help as well, such as those provided by the pracma package:

library(pracma)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat)

This is the whole script:

##dat = read.table("c://AAPL_logret.txt")$V1

set.seed(1233)
dat <- rlnorm(100, -3, 0.1)

mu= 0.005
theta = 200

posterior = function (sigma2, dat){
numerator = function (sigma2, dat)  {
    out = dexp(sigma2, theta)
    for (i in 1:length(dat))    {
    out = out * dnorm(dat[i], mu, sqrt(sigma2))
    }
    return(out)   
}

denominator = integrate(numerator,lower = 0, upper = Inf, dat=dat)$value
return (numerator(sigma2,dat)/denominator)
}

curve(posterior(x,dat), from=0, to = 0.006,
  xlab=expression(sigma^2), ylab = "Density", cex.axis = 1.3,
  cex.lab=1.3, col=4, lwd=1.5, lty=4, n=20000)

integrate(posterior, lower=0, upper=0.006, dat=dat)$value

posterior_times_sigma2 = function(sigma2,dat){
return(sigma2 * posterior(sigma2,dat))
}

integrate(posterior_times_sigma2, lower=0, upper=Inf, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.2, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.1, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.006, dat=dat)$value



library(pracma)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat, method="Kronrod")

Upvotes: 3

Related Questions