DataTx
DataTx

Reputation: 1869

MATLAB ksdensity equivalent in Python

I've looked online and have yet to find an answer or way to figure the following

I'm translating some MATLAB code to Python where in MATLAB im looking to find the kernel density estimation with the function:

[p,x] = ksdensity(data)

where p is the probability at point x in the distribution.

Scipy has a function but only returns p.

Is there a way to find the probability at values of x?

Thanks!

Upvotes: 11

Views: 7102

Answers (3)

lollows
lollows

Reputation: 71

Matlab is orders of magnitude faster than KernelDensity when it comes to finding the optimal bandwidth. Any idea of how to make the KernelDenisty faster? – Yuca Jul 16 '18 at 20:58

Hi, Yuca. The matlab use Scott rule to estimate the bandwidth, which is fast but requires the data from the normal distribution. For more information, please see this Post.

Upvotes: 1

paisanco
paisanco

Reputation: 4164

Another option is the kernel density estimator in the Scikit-Learn Python package, sklearn.neighbors.KernelDensity

Here is a little example similar to the Matlab documentation for ksdensity for a Gaussian distribution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KernelDensity

np.random.seed(12345)
# similar to MATLAB ksdensity example x = [randn(30,1); 5+randn(30,1)];
Vecvalues=np.concatenate((np.random.normal(0,1,30), np.random.normal(5,1,30)))[:,None]
Vecpoints=np.linspace(-8,12,100)[:,None]
kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(Vecvalues)
logkde = kde.score_samples(Vecpoints)
plt.plot(Vecpoints,np.exp(logkde))
plt.show()

The plot this produces looks like:

enter image description here

Upvotes: 5

Robert Kern
Robert Kern

Reputation: 13430

That form of the ksdensity call automatically generates an arbitrary x. scipy.stats.gaussian_kde() returns a callable function that can be evaluated with any x of your choosing. The equivalent x would be np.linspace(data.min(), data.max(), 100).

import numpy as np
from scipy import stats

data = ...
kde = stats.gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 100)
p = kde(x)

Upvotes: 7

Related Questions