Rafael Angarita
Rafael Angarita

Reputation: 787

How to make this matplotlib plot less noisy?

How can I plot the following noisy data with a smooth, continuous line without considering each individual value? I would like to only show the behavior in a nicer way, without caring about noisy and extreme values. This is the code I am using:

import numpy
import sys
import matplotlib.pyplot as plt
from scipy.interpolate import spline

dataset = numpy.genfromtxt(fname='data', delimiter=",") 

dic = {}

for d in dataset:
    dic[d[0]] = d[1] 

plt.plot(range(len(dic)), dic.values(),linestyle='-', linewidth=2)

plt.savefig('plot.png')
plt.show()

plot

Upvotes: 5

Views: 7017

Answers (2)

makeyourownmaker
makeyourownmaker

Reputation: 1833

There is more than one way to do it!

Here I show how to reduce noise using a variety of techniques:

  1. Moving average
  2. LOWESS regression
  3. Low pass filter
  4. Interpolation

Sticking with @Hooked example data for consistency:

import numpy as np
import matplotlib.pyplot as plt

X = np.arange(1, 1000, 1)
Y = np.log(X ** 3) + 10 * np.random.random(X.shape)

plt.plot(X, Y, alpha = .5)
plt.show()

enter image description here


  1. Moving average

Sometimes all you need is a moving average.

For example, using pandas with a window size of 100:

import pandas as pd

df = pd.DataFrame(Y, X)
df_mva = df.rolling(100).mean()  # moving average with a window size of 100

df_mva.plot(legend = False);

enter image description here

You will probably have to try several window sizes with your data. Note that the first 100 values of df_mva will be NaN but these can be removed with the dropna method.

Usage details for the pandas rolling function.


  1. LOWESS regression

I've used LOWESS (Locally Weighted Scatterplot Smoothing) successfully to remove noise from repeated measures datasets. More information on local regression methods, including LOWESS and LOESS, here. It's a simple method with only one parameter to tune which in my experience gives good results.

Here is how to apply the LOWESS technique using the statsmodels implementation:

import statsmodels.api as sm

y_lowess = sm.nonparametric.lowess(Y, X, frac = 0.3)  # 30 % lowess smoothing

plt.plot(y_lowess[:, 0], y_lowess[:, 1])  # some noise removed
plt.show()

enter image description here

It may be necessary to vary the frac parameter, which is the fraction of the data used when estimating each y value. Increase the frac value to increase the amount of smoothing. The frac value must be between 0 and 1.

Further details on statsmodels lowess usage.


  1. Low pass filter

Scipy provides a set of low pass filters which may be appropriate.

After application of the lfiter:

from scipy.signal import lfilter

n = 50             # larger n gives smoother curves
b = [1.0 / n] * n  # numerator coefficients
a = 1              # denominator coefficient
y_lf = lfilter(b, a, Y)

plt.plot(X, y_lf)
plt.show()

enter image description here

Check scipy lfilter documentation for implementation details regarding how numerator and denominator coefficients are used in the difference equations.

There are other filters in the scipy.signal package.


  1. Interpolation

Finally, here is an example of radial basis function interpolation:

from scipy.interpolate import Rbf

rbf = Rbf(X, Y, function = 'multiquadric', smooth = 500)
y_rbf = rbf(X)

plt.plot(X, y_rbf)
plt.show()

enter image description here

Smoother approximation can be achieved by increasing the smooth parameter. Alternative function parameters to consider include 'cubic' and 'thin_plate'. When considering the function value, I usually try 'thin_plate' first followed by 'cubic'; however both 'thin_plate' and 'cubic' seemed to struggle with the noise in this dataset.

Check other Rbf options in the scipy docs. Scipy provides other univariate and multivariate interpolation techniques (see this tutorial).

Upvotes: 0

Hooked
Hooked

Reputation: 88118

In a previous answer, I was introduced to the Savitzky Golay filter, a particular type of low-pass filter, well adapted for data smoothing. How "smooth" you want your resulting curve to be is a matter of preference, and this can be adjusted by both the window-size and the order of the interpolating polynomial. Using the cookbook example for sg_filter:

import numpy as np
import sg_filter
import matplotlib.pyplot as plt


# Generate some sample data similar to your post
X = np.arange(1,1000,1)
Y = np.log(X**3) + 10*np.random.random(X.shape)

Y2 = sg_filter.savitzky_golay(Y, 101, 3)

plt.plot(X,Y,linestyle='-', linewidth=2,alpha=.5)
plt.plot(X,Y2,color='r')

plt.show()

enter image description here

Upvotes: 8

Related Questions