Laura
Laura

Reputation: 360

Python - integrate 2D Kernel Density Estimation within contour lines

I would like to draw a contour plot of a Kernel Density Estimation, where the KDE is integrated within each of the contour plot filled areas.

As an example, imagine I calculate the KDE of 2D data:

data = np.random.multivariate_normal((0, 0), [[1, 1], [2, 0.7]], 100)
x = data[:, 0]
y = data[:, 1]
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)

I know how to draw the contourplot of the KDE.

fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='Blues')
cset = ax.contour(xx, yy, f, colors='k')
plt.show()

However, this contourplot shows what is the probability density within each of the filled areas. Instead, I would like the plot to indicate the total probability to fall within each of the filled areas.

Upvotes: 4

Views: 2273

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53029

Please note that the following is only correct as long as your contours are 'monotonic', i.e. inside a contour line you find only pixel values above the correspoonding contour level. Also note that if your density is multipeaked corresponding areas in separate peaks are lumped together.

If this is true/acceptable your problem can be solved by ordering the pixels by value.

I don't know by which heuristic your plot program chooses its contour levels, but assuming you have them stored (in ascending order, say) in a variable called 'levels' you could try something like

ff = f.ravel()
order = np.argsort(ff)
fsorted = ff[order]
F = np.cumsum(fsorted)
# depending on how your density is normalised next line may be superfluous
# also note that this is only correct for equal bins
# and, finally, to be unimpeachably rigorous, this disregards the probability
# mass outside the field of view, so it calculates probability condtional
# on being in the field of view
F /= F[-1]
boundaries = fsorted.searchsorted(levels)
new_levels = F[boundaries]

Now, for you to be able to use this your plot program must either allow you to freely choose the contour labels or at least to choose the levels at which to put the contours. In the latter case, assuming there's a kwarg 'levels'

# make a copy to avoid problems with in-place shuffling
# i.e. overwriting positions whose original values are still to be read out
F[order] = F.copy()
F.shape = f.shape
cset = ax.contour(xx, yy, F, levels=new_levels, colors='k')

I've copied the following from one of your comments to make it more visible

Finally, if one wants to really have the probability within each filled area, this is a workaround that works: cb = fig.colorbar(cfset, ax = ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) – Laura

Upvotes: 1

Related Questions