Reputation: 787
How can I plot the following noisy data with a smooth, continuous line without considering each individual value? I would like to only show the behavior in a nicer way, without caring about noisy and extreme values. This is the code I am using:
import numpy
import sys
import matplotlib.pyplot as plt
from scipy.interpolate import spline
dataset = numpy.genfromtxt(fname='data', delimiter=",")
dic = {}
for d in dataset:
dic[d[0]] = d[1]
plt.plot(range(len(dic)), dic.values(),linestyle='-', linewidth=2)
plt.savefig('plot.png')
plt.show()
Upvotes: 5
Views: 7017
Reputation: 1833
There is more than one way to do it!
Here I show how to reduce noise using a variety of techniques:
Sticking with @Hooked example data for consistency:
import numpy as np
import matplotlib.pyplot as plt
X = np.arange(1, 1000, 1)
Y = np.log(X ** 3) + 10 * np.random.random(X.shape)
plt.plot(X, Y, alpha = .5)
plt.show()
Sometimes all you need is a moving average.
For example, using pandas with a window size of 100:
import pandas as pd
df = pd.DataFrame(Y, X)
df_mva = df.rolling(100).mean() # moving average with a window size of 100
df_mva.plot(legend = False);
You will probably have to try several window sizes with your data. Note that the first 100 values of df_mva
will be NaN but these can be removed with the dropna
method.
Usage details for the pandas rolling function.
I've used LOWESS (Locally Weighted Scatterplot Smoothing) successfully to remove noise from repeated measures datasets. More information on local regression methods, including LOWESS and LOESS, here. It's a simple method with only one parameter to tune which in my experience gives good results.
Here is how to apply the LOWESS technique using the statsmodels implementation:
import statsmodels.api as sm
y_lowess = sm.nonparametric.lowess(Y, X, frac = 0.3) # 30 % lowess smoothing
plt.plot(y_lowess[:, 0], y_lowess[:, 1]) # some noise removed
plt.show()
It may be necessary to vary the frac
parameter, which is the fraction of the data used when estimating each y value. Increase the frac
value to increase the amount of smoothing. The frac
value must be between 0 and 1.
Further details on statsmodels lowess usage.
Scipy provides a set of low pass filters which may be appropriate.
After application of the lfiter:
from scipy.signal import lfilter
n = 50 # larger n gives smoother curves
b = [1.0 / n] * n # numerator coefficients
a = 1 # denominator coefficient
y_lf = lfilter(b, a, Y)
plt.plot(X, y_lf)
plt.show()
Check scipy lfilter documentation for implementation details regarding how numerator and denominator coefficients are used in the difference equations.
There are other filters in the scipy.signal package.
Finally, here is an example of radial basis function interpolation:
from scipy.interpolate import Rbf
rbf = Rbf(X, Y, function = 'multiquadric', smooth = 500)
y_rbf = rbf(X)
plt.plot(X, y_rbf)
plt.show()
Smoother approximation can be achieved by increasing the smooth
parameter. Alternative function
parameters to consider include 'cubic' and 'thin_plate'. When considering the function
value, I usually try 'thin_plate' first followed by 'cubic'; however both 'thin_plate' and 'cubic' seemed to struggle with the noise in this dataset.
Check other Rbf
options in the scipy docs. Scipy provides other univariate and multivariate interpolation techniques (see this tutorial).
Upvotes: 0
Reputation: 88118
In a previous answer, I was introduced to the Savitzky Golay filter, a particular type of low-pass filter, well adapted for data smoothing. How "smooth" you want your resulting curve to be is a matter of preference, and this can be adjusted by both the window-size and the order of the interpolating polynomial. Using the cookbook example for sg_filter
:
import numpy as np
import sg_filter
import matplotlib.pyplot as plt
# Generate some sample data similar to your post
X = np.arange(1,1000,1)
Y = np.log(X**3) + 10*np.random.random(X.shape)
Y2 = sg_filter.savitzky_golay(Y, 101, 3)
plt.plot(X,Y,linestyle='-', linewidth=2,alpha=.5)
plt.plot(X,Y2,color='r')
plt.show()
Upvotes: 8