Z B
Z B

Reputation: 131

integrating the square of probability density?

Suppose I have

set.seed(2020)    # make the results reproducible
a <- rnorm(100, 0, 1)

My probability density is estimated through kernel density estimator (gaussian) in R using the R built in function density. The question is how to integrate the square of the estimated density. It does not matter between which values, let us suppose between -Inf and +Inf. I have tried the following:

f <- approxfun(density(a)$x, density(a)$y)
integrate (f*f, min(density(a)$x), max(density(a)$x))

Upvotes: 3

Views: 383

Answers (3)

Allan Cameron
Allan Cameron

Reputation: 173793

There are a couple of problems here. First you have the x and y round the wrong way in approxfun. Secondly, you can't multiply function names together. You need to specify a new function that gives you the square of your original function:

set.seed(2020)   
a  <- rnorm(100, 0, 1)
f  <- approxfun(density(a)$x, density(a)$y)
f2 <- function(v) ifelse(is.na(f(v)), 0, f(v)^2)

integrate (f2, -Inf, Inf)
#> 0.2591153 with absolute error < 0.00011

We can also plot the original density function and the squared density function:

curve(f, -3, 3)
curve(f2, -3, 3, add = TRUE, col = "red")

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a way using package caTools, function trapz. It computes the integral given a vector x and its corresponding image y using the trapezoidal rule.
I also include a function trapzf based on the original to have the integral computed with the function returned by approxfun

trapzf <- function(x, FUN) trapz(x, FUN(x))

set.seed(2020)    # make the results reproducible
a <- rnorm(100, 0, 1)

d <- density(a)
f <- approxfun(d$x, d$y)

int1 <- trapz(d$x, d$y^2)
int2 <- trapzf(d$x, function(x) f(x)^2)

int1
#[1] 0.2591226

identical(int1, int2)
#[1] TRUE

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

I think you should write the objective function as function(x) f(x)**2, rather than f*f, e.g.,

> integrate (function(x) f(x)**2, min(density(a)$x), max(density(a)$x))
0.2331793 with absolute error < 6.6e-06

Upvotes: 2

Related Questions