Reputation: 957
Given a list of tuples, [(x1, y1), (x2, y2) ... (xm, ym)]
such as [(1, 2), (3, 7), (5, 9)]
I would like to write a function that fills in the missing integer values x with the average of the neighbor values f(x - 1), f(x + 1).
In this case, we would get:
[(1, 2), (2, ave(2, 7)), (3, 7), (4, ave(7, 9)), (5, 9)]
import numpy as np
# calculating nearest neighbor averages
def nearest(x, y):
# define the min and max for our line
min = np.amin(x)
max = np.amax(x)
# fill in the gaps
numsteps = max - min + 1
# an empty vessel
new_df = []
# an empty vessel for our xs
xs = np.linspace(min, max, numsteps)
for i, item in enumerate(xs):
if(xs[i] in x):
idx = x.index(xs[i])
new_df.insert(i, (xs[i], y[idx]))
else:
idx = x.index(xs[i] - 1)
idx2 = x.index(xs[i] + 1)
avg = (y[idx] + y[idx2])/2.0
new_df.insert(i, (xs[i], avg))
print new_df
nearest([1, 3, 5], [6, 7, 8])
// [(1.0, 6), (2.0, 6.5), (3.0, 7), (4.0, 7.5), (5.0, 8)]
This quickly fails, however, with an array such as xs = [1, 4, 7]
, since the values are more than one away from each other. In that case, given the same ys = [2, 7, 9]
, we would expect the answer to either be:
[(1, 2), (2, ave(2, 7)), (3, ave(2,7)), (4, 7) ... ]
or
Something a bit more complicated:
[(1, 2), (2, ave(prev, next_that_exists)), (3, ave(just_created, next_that exists), ...]
How can I implement so that we find the elements just below the missing one and just above the missing one, and compute their average?
Also, is this different from a moving average?
Upvotes: 2
Views: 901
Reputation: 40733
Here is my approach: from the input, create a dictionary with the first list as the key and the second list as value. Then create a function, get_value()
to get the value, calculate it if needed.
def get_value(pairs, key):
try:
return pairs[key]
except KeyError:
previous_value = get_value(pairs, key -1)
next_value = get_value(pairs, key + 1)
return (previous_value + next_value) / 2.0
def nearest(x, y):
pairs = dict(zip(x, y))
for i in range(1, max(x) + 1):
yield i, get_value(pairs, i)
print list(nearest([1, 3, 5], [6, 7, 8]))
I now have a chance to revisit this question. Based on your description, you want to interpolate the missing values. Since you already have numpy
installed, why not use it?
import numpy as np
def nearest(x, y):
all_x = range(min(x), max(x) + 1)
return zip(all_x, np.interp(all_x, x, y))
print nearest([1, 3, 5], [6, 7, 8])
print nearest([1, 4, 7], [6, 7, 8])
Output:
[(1, 6.0), (2, 6.5), (3, 7.0), (4, 7.5), (5, 8.0)]
[(1, 6.0), (2, 6.333333333333333), (3, 6.666666666666667), (4, 7.0), (5, 7.333333333333333), (6, 7.666666666666667), (7, 8.0)]
The numpy.interp
does all the heavy lifting, function nearest only need to figure out a list of all the x
values.
Upvotes: 1
Reputation: 229361
This should work:
def nearest(x, y):
assert len(x) == len(y)
res = []
for i in xrange(len(x)-1):
res.append((x[i], y[i]))
gap = x[i+1] - x[i]
for j in xrange(1, gap):
res.append((x[i]+j, y[i] + j * (y[i+1]-y[i]) / float(gap)))
res.append((x[-1], y[-1]))
return res
Sample output:
print nearest([1, 3, 5], [2, 7, 9])
print nearest([1, 4, 7], [2, 7, 9])
Gives:
[(1, 2), (2, 4.5), (3, 7), (4, 8.0), (5, 9)]
[(1, 2), (2, 3.666666666666667), (3, 5.333333333333334), (4, 7), (5, 7.666666666666667), (6, 8.333333333333334), (7, 9)]
Explanation:
I solved the [1, 4]
, [2, 7]
case by hand, noting that the values we want are 2, x, y, 7
where
x = (2 + y) / 2
y = (x + 7) / 2
I got x = 11/3
and y = 16/3
, yielding:
6/3, 11/3, 16/3, 21/3
Note that the gap between each of these is 5/3
, or (7-2) / (4-1)
. That's when I realized that by wanting to fill in with the average of the neighbor values across larger gaps, you basically want a linear interpolation from one value to the next over the given number of steps. That is, for example, given you want to go from 2
to 7
in 3
steps, you add 5/3
to 2
repeatedly until you get to 7
.
Upvotes: 1