Calculate the volume under a plot of kernel bivariate density estimation

Question

I need to calculate a measure called mutual information. First of all, I need to calculate another measure, called entropy, for example, the joint entropy of x and y:

-∬p(x,y)·log p(x,y)dxdy

So, to calculate p(x,y), I used the kernel density estimator (in this way, function kde2d, and it returned the Z values (probability of having x and y in that window).

Again, by now, I have a matrix of Z values [1x100] x [1x100], that's equal my p(x,y). But I have to integrate it, by discovering the volume under the surface (doble integral). But I didn't found a way to do that. The function quad2d, to compute the double quadrature didn't work, because I only integrated a numerical matrix p(x,y), and it gives me a constant....

Anyone knows something to find that volume/calculate the double integral?

The image of the plot from persp3d:

Thanks everybody !!!!

Zheyuan Li · Accepted Answer

Once you have the results from kde2d, it is very straighforward to compute a numerical integral. The example session below sketches how to do it.

As you know, numerical double integral is just a 2D summation. The kde2d, by default takes range(x) and range(y) as 2D domain. I see that you got a 100*100 matrix, so I think you have set n = 100 in using kde2d. Now, kde$x, kde$y defines a 100 * 100 grid, with den$z giving density on each grid cell. It is easy to compute the size of each grid cell (they are all equal), then we do three steps:

find normalizing constants; although we know that in theory, density sums up (or integrates) to 1, but after computer discretization, it only approximates 1. So we first compute this normalizing constant for later rescaling;
the integrand for entropy is z * log(z); since z is a 100 * 100 matrix, this is also a matrix. You simply sum them up, and multiply it by the cell size cell_size, then you get a non-normalized entropy;
rescale the non-normalized entropy for a normalized one.

## sample data: bivariate normal, with covariance/correlation 0
set.seed(123); x <- rnorm(1000, 0, 2)  ## marginal variance: 4
set.seed(456); y <- rnorm(1000, 0, 2)  ## marginal variance: 4

## load MASS
library(MASS)

## domain:
xlim <- range(x)
ylim <- range(y)
## 2D Kernel Density Estimation
den <- kde2d(x, y, n = 100, lims = c(xlim, ylim))
##persp(den$x,den$y,den$z)
z <- den$z  ## extract density

## den$x, den$y expands a 2D grid, with den$z being density on each grid cell
## numerical integration is straighforward, by aggregation over all cells
## the size of each grid cell (a rectangular cell) is:
cell_size <- (diff(xlim) / 100) * (diff(ylim) / 100)

## normalizing constant; ideally should be 1, but actually only close to 1 due to discretization
norm <- sum(z) * cell_size

## your integrand: z * log(z) * (-1):
integrand <- z * log(z) * (-1)

## get numerical integral by summation:
entropy <- sum(integrand) * cell_size

## self-normalization:
entropy <- entropy / norm

Verification

The above code gives entropy of 4.230938. Now, Wikipedia - Multivariate normal distribution gives entropy formula:

(k / 2) * (1 + log(2 * pi)) + (1 / 2) * log(det(Sigma))

For the above bivariate normal distribution, we have k = 2. We have Sigma (covariance matrix):

4  0
0  4

whose determinant is 16. Hence, the theoretical value is:

(1 + log(2 * pi)) + (1 / 2) * log(16) = 4.224171

Good match!

Calculate the volume under a plot of kernel bivariate density estimation

Answers (1)

Related Questions