Reputation: 1195
I have a given list of values and a collection of lists (lists A
, B
, and C
) with similar values. I'm trying to find a way to return the list that most closely matches the given
list. I'd like to use a least squares fit as the distance metric.
given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]
So in this case, it would return C
as the closest match to given
.
I thought I could incorporate something like:
match = [min([val[idx] for val in [A,B,C]], key=lambda x: abs(x-given[idx])) for idx in range(len(given))]
But that only returns the closest value for each list element. I'm not sure how to then identify list C as the closest point-by-point match.
Also, if the lists are different lengths, I really don't know what to do if I'm not comparing them index by index. For example:
given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 2, 5.1, 3, 6.8, 7.1, 8.2, 9]
B = [-0.1, 0.9, 2.1, 3.1, 3.9]
C = [-1.7, -1, 0, 1.1, 2, 2.9, 4, 5.1, 6, 7.1, 8]
would still return C
as the closest match.
I'm also using Numpy but haven't found anything useful. Any help would be greatly appreciated!
Upvotes: 0
Views: 778
Reputation: 26
You can use the sum of the squared errors. I made a quick example:
from copy import copy
def squaredError(a, b):
r = copy(a)
for i in range(len(a)):
r[i] -= b[i]
r[i] *= r[i]
return sum(r)
given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]
print squaredError(given, A)
print squaredError(given, B)
print squaredError(given, C)
match = min(map(lambda x: (squaredError(given, x), x), [A,B,C]))[1]
print match
Upvotes: 1
Reputation: 309831
The pure python solution isn't most efficient, but here's one implementation using least squares for the distance metric.
def distance(x,y):
return sum( (a-b)**2 for a,b in zip(x,y) )
given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]
min((A,B,C),key=lambda x:distance(x,given))
Assuming np.ndarrays
of the same size, distance
could be written as:
def distance(x,y):
return ((x-y)**2).sum()
Upvotes: 1