Tom Kurushingal
Tom Kurushingal

Reputation: 6496

How to smoothen data in Python?

I am trying to smoothen a scatter plot shown below using SciPy's B-spline representation of 1-D curve. The data is available here.

enter image description here

The code I used is:

import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate

data = np.genfromtxt("spline_data.dat", delimiter = '\t')
x = 1000 / data[:, 0]
y = data[:, 1]
x_int = np.linspace(x[0], x[-1], 100)
tck = interpolate.splrep(x, y, k = 3, s = 1)
y_int = interpolate.splev(x_int, tck, der = 0)

fig = plt.figure(figsize = (5.15,5.15))
plt.subplot(111)
plt.plot(x, y, marker = 'o', linestyle='')
plt.plot(x_int, y_int, linestyle = '-', linewidth = 0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

I tried changing the order of the spline and the smoothing condition, but I am not getting a smooth plot.

B-spline interpolation should be able to smoothen the data but what is wrong? Any alternate method to smoothen this data?

Upvotes: 3

Views: 11865

Answers (3)

Andrzej Pronobis
Andrzej Pronobis

Reputation: 36096

Assuming we are dealing with noisy observations of some phenomena, Gaussian Process Regression might also be a good choice. Knowledge about the variance of the noise can be included into the parameters (nugget) and other parameters can be found using Maximum Likelihood estimation. Here's a simple example of how it could be applied:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.gaussian_process import GaussianProcess

data = np.genfromtxt("spline_data.dat", delimiter='\t')
x = 1000 / data[:, 0]
y = data[:, 1]
x_pred = np.linspace(x[0], x[-1], 100)

# <GP regression>
gp = GaussianProcess(theta0=1, thetaL=0.00001, thetaU=1000, nugget=0.000001)
gp.fit(np.atleast_2d(x).T, y)
y_pred = gp.predict(np.atleast_2d(x_pred).T)
# </GP regression>

fig = plt.figure(figsize=(5.15, 5.15))
plt.subplot(111)
plt.plot(x, y, marker='o', linestyle='')
plt.plot(x_pred, y_pred, linestyle='-', linewidth=0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

which will give:

enter image description here

Upvotes: 4

wordsforthewise
wordsforthewise

Reputation: 15777

In your specific case, you could also try changing the last argument of the np.linspace function to a smaller number, np.linspace(x[0], x[-1], 10), for example.

Demo code:

import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate

data = np.random.rand(100,2)
tempx = list(data[:, 0])
tempy = list(data[:, 1])
x = np.array(sorted([point*10 + tempx.index(point) for point in tempx]))
y = np.array([point*10 + tempy.index(point) for point in tempy])
x_int = np.linspace(x[0], x[-1], 10)
tck = interpolate.splrep(x, y, k = 3, s = 1)
y_int = interpolate.splev(x_int, tck, der = 0)

fig = plt.figure(figsize = (5.15,5.15))
plt.subplot(111)
plt.plot(x, y, marker = 'o', linestyle='')
plt.plot(x_int, y_int, linestyle = '-', linewidth = 0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

You could also smooth the data with a rolling_mean in pandas:

import pandas as pd

data = [...(your data here)...]

smoothendData = pd.rolling_mean(data,5)

the second argument of rolling_mean is the moving average (rolling mean) period. You can also reverse the data 'data.reverse', take a rolling_mean of the data that way, and combine it with the forward rolling mean. Another option is exponentially weighted moving averages: Pandas: Exponential smoothing function for column

or using bandpass filters: fft bandpass filter in python http://docs.scipy.org/doc/scipy/reference/signal.html

Upvotes: 0

jme
jme

Reputation: 20695

Use a larger smoothing parameter. For example, s=1000:

tck = interpolate.splrep(x, y, k=3, s=1000)

This produces:

interpolation

Upvotes: 4

Related Questions