Gabriel
Gabriel

Reputation: 42329

How to obtain a weighted gaussian filter

I have a set of weighted x,y points, like shown below (the full set is here):

#  x       y     w
-0.038  2.0127  0.71
0.058   1.9557  1
0.067   2.0016  0.9
0.072   2.0316  0.83
...

I need to find a smoothed line that adjusts these points according to the importance assigned to each, ie: more weight means the data point should have more relevance.

This is the code I have so far, which basically applies a gaussian_filter1d to the data (I got the idea from this question: line smoothing algorithm in python?):

import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import gaussian_filter1d

# Read data from file.
data = np.loadtxt('data_file', unpack=True)
x, y, w = data[0], data[1], data[2]

# Return evenly spaced numbers over a specified interval.
t = np.linspace(0, 1, len(x))
t2 = np.linspace(0, 1, 100)    
# One-dimensional linear interpolation.
x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)

# Obtain Gaussian filter with fixed sigma value.
sigma = 7
x3 = gaussian_filter1d(x2, sigma)
y3 = gaussian_filter1d(y2, sigma)

# Make plot.
cm = plt.cm.get_cmap('RdYlBu')
plt.scatter(x, y, marker="o", c=w, s=40, cmap=cm, lw=0.5, vmin=0, vmax=1)
plt.plot(x3, y3, "r", lw=2)
plt.show()

This code produces the following plot (bluer dots have a higher weight value):

plot

The problem is that this fit does not consider the weights assigned to each point. How can I introduce that information into the gaussian filter?

Upvotes: 4

Views: 6362

Answers (1)

Developer
Developer

Reputation: 8400

Note that the following idea is workaround not an exact solution, but it is worth to try.

The idea is to use w weight parameter to repeat corresponding values in x and y. So if you scale w for example into range [1,10] all corresponding values in x and so in y will be duplicated 10 times for w equal to 10. That is, new x, y will be created. In this way we incorporate the weight as frequency of values in x and y, indeed. Having this done, feeding the new ones to your algorithm hopefully gives you desired results as shown in the worked examples below.

  • For the first figure, blue-to-red spectrum correspond to lower-to-high weights. Numbers of title are the duplicating factor as described above.
  • For the second figure, your data, we didn't touch your color-format.

enter image description here

enter image description here

Upvotes: 5

Related Questions