cbontoiu
cbontoiu

Reputation: 63

Data filtering with Python

I have two lists that make the coordinates of points (x,y). To each x value there is a single y value and I can plot (as line plot here) the blue graph. However, y varies rapidly and the signal is fuzzy, though clearly showing some repetition overall. I need to extract a smoother line, as the one drawn in red. Anyone could help with a suggestion how to filter the raw data and maybe apply some regression afterwards?

This is the code and the data for the first 20 points

import matplotlib.pyplot as plt;
x = [0.00, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00, 1.10, 1.20, 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90]
y = [-5.39, -11.86, -14.46, -12.73, -12.74, -8.17, -3.00, -9.63, -6.86, -2.59, -7.98, -8.31, -6.62, -4.68, -7.23, -4.10, -5.43, -7.89, -7.23, -6.10]
fig, axes = plt.subplots(nrows=1, ncols=1, figsize = (10, 5));
axes.set(xlabel='x', ylabel='y');
axes.plot(x, y, linewidth = 2, color = 'dodgerblue')
plt.scatter(x, y, color = 'lime', s = 25);

Complete x and y data can be found at this link as two columns of text separated by a single space:

https://drive.google.com/file/d/1pI4sA20BgBGEjHCkSL5TaeSBGvBORwld/view?usp=sharing

enter image description here

Thank you.

enter image description here

Upvotes: 0

Views: 1103

Answers (3)

Corralien
Corralien

Reputation: 120489

You can use lowess function from statsmodels package:

# Python env: pip install statsmodels
# Anaconda env: conda install statsmodels

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.nonparametric.smoothers_lowess import lowess

# Your data
# x = ...
# y = ...

# adjust few parameters
z = lowess(y, x, frac=0.025, it=0, return_sorted=False)

plt.plot(x, y)
plt.plot(x, z)
plt.show()

lowess

Upvotes: 1

Marcello Zago
Marcello Zago

Reputation: 726

You could try to use a gaussian filter to reduce the high frequency of the data smooth the curve. Here is an example:

import matplotlib.pyplot as plt;
from scipy.ndimage import gaussian_filter

x = [0.00, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.00, 1.10, 1.20, 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90]
y = [-5.39, -11.86, -14.46, -12.73, -12.74, -8.17, -3.00, -9.63, -6.86, -2.59, -7.98, -8.31, -6.62, -4.68, -7.23, -4.10, -5.43, -7.89, -7.23, -6.10]

#smothing with a gaussian filter
y = gaussian_filter(y, sigma=1)

fig, axes = plt.subplots(nrows=1, ncols=1, figsize = (10, 5));
axes.set(xlabel='x', ylabel='y');
# line plot ........................................................
axes.plot(x, y, linewidth = 2, color = 'dodgerblue')
plt.scatter(x, y, color = 'lime', s = 25);

plt.show()

Upvotes: 0

mozway
mozway

Reputation: 261944

Here is a quick example on how to use rolling+mean to smooth your data:

df = pd.read_csv('across-Z-along-Y-1D-Ecd_x_150.00_nm_iter_17280_time_3_335040_fs_Snapshot.dat',
                 delimiter=' ', names=['x', 'y'])
df['y2'] = df['y'].rolling(window=50, center=True).mean()
ax = df.plot('x', 'y')
df.plot('x', 'y2', ax=ax)

rolling

Upvotes: 0

Related Questions