Reputation: 1341
I have two lists of data points:
list_x = [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
list_y = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
When I plot them, the graph will look like this:
import matplotlib.pyplot as plt
plt.plot(list_x, list_y)
plt.show()
Based on these datapoints, is there a way to make the graph that looks like the one below and get its graph equation?
===========================================================
I have tried using the solution from here, and it produces a graph that is not smooth.
from scipy.interpolate import spline
import numpy as np
list_x_new = np.linspace(min(list_x), max(list_x), 1000)
list_y_smooth = spline(list_x, list_y, list_x_new)
plt.plot(list_x_new, list_y_smooth)
plt.show()
Upvotes: 18
Views: 44079
Reputation: 1833
Here are 3 more curve smoothing options:
But first, recreate the original plot:
import matplotlib.pyplot as plt
list_x = [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
list_y = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
plt.plot(list_x, list_y)
plt.show()
The Savitzky-Golay technique fits subsets (windows) of adjacent points to low order polynomials using least squares.
How to apply the Savitzky-Golay filter:
from scipy.signal import savgol_filter
window = 21
order = 2
y_sf = savgol_filter(list_y, window, order)
plt.plot(list_x, y_sf)
plt.show()
The window
and order
parameters mean this filter is quite adaptable.
Read more about using this filter in the scipy documentation.
LOWESS (locally weighted scatterplot smoothing) is a local regression method. In my experience it is simple to tune and often gives great results.
How to apply the LOWESS smoother:
import statsmodels.api as sm
y_lowess = sm.nonparametric.lowess(list_y, list_x, frac = 0.30) # 30 % lowess smoothing
plt.plot(y_lowess[:, 0], y_lowess[:, 1])
plt.show()
It may be possible to improve the approximation by varying the frac
parameter, which is the fraction of the data used when estimating each y value. Increase the frac
value to increase the amount of smoothing. The frac
value must be between 0 and 1.
Further details on statsmodels lowess usage.
After application of the lfilter:
from scipy.signal import lfilter
n = 15 # larger n gives smoother curves
b = [1.0 / n] * n # numerator coefficients
a = 1 # denominator coefficient
y_lf = lfilter(b, a, list_y)
plt.plot(list_x, y_lf)
plt.show()
Check scipy lfilter documentation for implementation details regarding how numerator and denominator coefficients are used in the difference equations.
There are other filters in the scipy.signal package.
Care must be taken to avoid over-smoothing with all these approaches.
Additionally, some of these methods may have unexpected edge effects.
Upvotes: 7
Reputation: 17169
One easy option that echoes the suggestion from Davis Herring would be to use a polynomial approximation for the data
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
poly = np.polyfit(list_x,list_y,5)
poly_y = np.poly1d(poly)(list_x)
plt.plot(list_x,poly_y)
plt.plot(list_x,list_y)
plt.show()
You would notice the oscillation at the right end of the plot that is not present in the original data which is an artifact of polynomial approximation.
Spline interpolation as suggested above by Davis is another good option. Varying the smoothness parameter s
you can achieve different balance between smoothness and distance to the original data.
from scipy.interpolate import splrep, splev
plt.figure()
bspl = splrep(list_x,list_y,s=5)
bspl_y = splev(list_x,bspl)
plt.plot(list_x,list_y)
plt.plot(list_x,bspl_y)
plt.show()
Upvotes: 19
Reputation: 39818
Because your data is approximate (i.e., it has been quantized), you want an approximating spline, not an interpolating spline.
Upvotes: 1