Inconsistency between gaussian_kde and density integral sum

Question

Can one explain why after estimation of kernel density

d = gaussian_kde(g[:,1])

And calculation of integral sum of it:

x = np.linspace(0, g[:,1].max(), 1500)
integral = np.trapz(d(x), x)

I got resulting integral sum completely different to 1:

print integral
Out: 0.55618

cr1msonB1ade · Accepted Answer

As stated in my comment, this is an issue with kernel density support. The Gaussian kernel has infinite support. Even fit on data with a specific range the range of the Gaussian kernel will be from negative to positive infinity. That being said the large majority of the density will lie within a range reasonably around the range of the fitted data.

If you would like a Gaussian kernel fitted within the range of your original data, you can fit a truncated Gaussian kernel by truncating the kernel and re-normalizing the truncated portion to integrate to 1, but I am not sure if that is what you want here. You can also truncate to a non-negative Gaussian kernel with similar logic.

Inconsistency between gaussian_kde and density integral sum

Answers (1)

Related Questions