Reputation: 205
I have one question about integrate() function in R. When I was using intergrate function to find the posterior mean, the result was very small (smaller than the absolute error). If I change the upper limit from Inf to 0.2 or a smaller number, the result number makes more sense. Why is the integrate function not working properly for larger upper limit?
Thanks so much!
The following is my code:
dat = read.table("c://AAPL_logret.txt")$V1
mu = 0.005
theta = 200
posterior = function (sigma2, dat)
{
numerator = function (sigma2, dat)
{
out = dexp(sigma2,theta)
for (i in 1:length(dat))
{
out = out * dnorm(dat[i],mu, sqrt(sigma2))
}
return(out)
}
denominator = integrate(numerator,lower = 0, upper = Inf, dat=dat)$value
return (numerator(sigma2,dat)/denominator)
}
curve(posterior(x,dat), from =0, to = 0.006,xlab=expression(sigma^2),
ylab = "Density", cex.axis = 1.3, cex.lab=1.3, col=4,lwd=1.5,lty=4,n=20000)
posterior_times_sigma2 = function(sigma2,dat)
{
return(sigma2 * posterior(sigma2,dat))
}
posteriormean = integrate(posterior_times_sigma2, lower =0, upper= Inf, dat=dat)$value
Upvotes: 2
Views: 228
Reputation: 1754
You have not posted the content of the "AAPL_logret.txt" file, so I generated bogus data which might look like that, e.g.:
dat <- rlnorm(200, -4, 1)
With this data, your curve() plot shows that most of the probability mass of the posterior distribution is contained between 0 and 0.006:
1-integrate(posterior, lower=0, upper=0.006, dat=dat)$value
is about 4e-11, therefore the posterior is numerically undistinguishable from zero beyond 0.006. The R help page of integrate() explains what goes wrong here:
"Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."
I believe that there is no general solution here: in any given situation you need to explore different options. In your case, one possibility is to restrict the integration interval:
integrate(posterior_times_sigma2, lower=0, upper=Inf, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.2, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.1, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.006, dat=dat)$value
Different integration methods might help as well, such as those provided by the pracma package:
library(pracma)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat)
This is the whole script:
##dat = read.table("c://AAPL_logret.txt")$V1
set.seed(1233)
dat <- rlnorm(100, -3, 0.1)
mu= 0.005
theta = 200
posterior = function (sigma2, dat){
numerator = function (sigma2, dat) {
out = dexp(sigma2, theta)
for (i in 1:length(dat)) {
out = out * dnorm(dat[i], mu, sqrt(sigma2))
}
return(out)
}
denominator = integrate(numerator,lower = 0, upper = Inf, dat=dat)$value
return (numerator(sigma2,dat)/denominator)
}
curve(posterior(x,dat), from=0, to = 0.006,
xlab=expression(sigma^2), ylab = "Density", cex.axis = 1.3,
cex.lab=1.3, col=4, lwd=1.5, lty=4, n=20000)
integrate(posterior, lower=0, upper=0.006, dat=dat)$value
posterior_times_sigma2 = function(sigma2,dat){
return(sigma2 * posterior(sigma2,dat))
}
integrate(posterior_times_sigma2, lower=0, upper=Inf, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.2, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.1, dat=dat)$value
integrate(posterior_times_sigma2, lower=0, upper=0.006, dat=dat)$value
library(pracma)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat)
integral(posterior_times_sigma2, xmin=0, xmax=Inf, dat=dat, method="Kronrod")
Upvotes: 3