Adele
Adele

Reputation: 23

Bandwidth kernel density python

I'm trying to calculate the kernel density function of a list of values:

x=[-0.04124324405924407, 0, 0.005249724476788287, 0.03599351958245578, -0.00252785423151014, 0.01007584102031178, -0.002510349639322063, -0.01264302961474806, -0.01797169063489579]

following this website: http://mark-kay.net/2013/12/24/kernel-density-estimation/ I want to calculate the best value for bandwidth, so I wrote this piece of code:

from sklearn.grid_search import GridSearchCV
grid = GridSearchCV(KernelDensity(),{'bandwidth': np.linspace(-1.0, 1.0, 30)},cv=20) # 20-fold cross-validation
grid.fit(x[:, None])
grid.best_params_

but when I run this:

grid.fit(x[:, None])

I get this error:

Error: TypeError: list indices must be integers, not tuple

Does anyone know how to fix it? Thanks

Upvotes: 2

Views: 1425

Answers (2)

Michael Baudin
Michael Baudin

Reputation: 1151

Given the small sample size, I would used OpenTURNS's KernelSmoothing class. It provides Scott's multidimensionnal rule by default. If needed, we can use Sheather and Jones's direct plugin algorithm, which provides a good bandwidth in many cases, even if the distribution is multimodal.

The following scripts uses the default bandwidth.

x = [
    -0.04124324405924407,
    0,
    0.005249724476788287,
    0.03599351958245578,
    -0.00252785423151014,
    0.01007584102031178,
    -0.002510349639322063,
    -0.01264302961474806,
    -0.01797169063489579,
]
import openturns as ot
sample = ot.Sample(x, 1)
factory = ot.KernelSmoothing()
distribution = factory.build(sample)

and that's it.

If a smarter bandwidth selection is to be used, we may use the computePluginBandwidth method which is based on Sheather and Jones's direct "solve-the-equation" rule. In the following script, I plot the distribution after evaluating the bandwidth.

bandwidth = factory.computePluginBandwidth(sample)
distribution = factory.build(sample, bandwidth)
distribution.drawPDF()

The bandwidth is evaluated as 0.00941247. The PDF is the following.

PDF estimated from KernelSmoothing

Upvotes: 2

Ilja Everilä
Ilja Everilä

Reputation: 53007

You are using a python list where you should use a numpy.array. The latter supports richer indexing.

import numpy as np
x = np.array([-0.04124324405924407, 0, 0.005249724476788287, 0.03599351958245578, -0.00252785423151014, 0.01007584102031178, -0.002510349639322063, -0.01264302961474806, -0.01797169063489579])

Upvotes: 1

Related Questions