Erick Gomez
Erick Gomez

Reputation: 83

How to smooth a line using gaussian kde kernel in python setting a bandwidth

I am trying to smooth the following data using python gaussian_kde however it is not working properly, it looks like the kde it is resampling for the distribution for the whole dataset instead of using a bandwidht for each point and giving the weights to do the smoothing

from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
y=[ 191.78 ,   191.59,    191.59,    191.41,    191.47,    191.33,    191.25  \
  ,191.33 ,   191.48 ,   191.48,    191.51,    191.43,    191.42,    191.54    \
  ,191.5975,  191.555,   191.52 ,   191.25 ,   191.15  ,  191.01  ]
x = np.linspace(1 ,20,len(y))
kde= gaussian_kde(y)
kde.set_bandwidth(bw_method=kde.factor / 3)

fig, ax = plt.subplots(figsize=(10, 10))
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False)
ax.scatter(x, y, color='black', label='data')
ax.plot(x,y,color='red')
ax.plot(x,kde(x))

Here it is the chart of the data

Chart of the data without smoothing

You can notice that the chart it is not smoothing the line

Chart after smoothing

Upvotes: 5

Views: 15455

Answers (1)

You are thinking that the kde_gaussian smooths a line, but what it is actually doing is smoothing the density distribution estimate of a dataset. Your data isn't a dataset like that, it's x/y coordinates.

Here are some examples of ways of smoothing linear data:

#from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np

from scipy import interpolate

from scipy import ndimage

y=[ 191.78 ,   191.59,    191.59,    191.41,    191.47,    191.33,    191.25  \
  ,191.33 ,   191.48 ,   191.48,    191.51,    191.43,    191.42,    191.54    \
  ,191.5975,  191.555,   191.52 ,   191.25 ,   191.15  ,  191.01  ]
x = np.linspace(1 ,20,len(y))

# convert both to arrays
x_sm = np.array(x)
y_sm = np.array(y)

# resample to lots more points - needed for the smoothed curves
x_smooth = np.linspace(x_sm.min(), x_sm.max(), 200)

# spline - always goes through all the data points x/y
y_spline = interpolate.spline(x, y, x_smooth)

spl = interpolate.UnivariateSpline(x, y)

sigma = 2
x_g1d = ndimage.gaussian_filter1d(x_sm, sigma)
y_g1d = ndimage.gaussian_filter1d(y_sm, sigma)

fig, ax = plt.subplots(figsize=(10, 10))
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False)

plt.plot(x_sm, y_sm, 'green', linewidth=1)
plt.plot(x_smooth, y_spline, 'red', linewidth=1)
plt.plot(x_smooth, spl(x_smooth), 'yellow', linewidth=1)
plt.plot(x_g1d,y_g1d, 'magenta', linewidth=1)

plt.show()

The plot looks like this:

enter image description here

Green is your original data, red is the spline, yellow is the UnivariateSpline and magenta is the gaussian_1d filtered data. If you lookup these functions there may be parameters like sigma that you can vary to further smooth the data, possibly. Have a google for the documentation.

Upvotes: 5

Related Questions