Joe Flip
Joe Flip

Reputation: 1195

Python: Find list that most closely matches input list value by value

I have a given list of values and a collection of lists (lists A, B, and C) with similar values. I'm trying to find a way to return the list that most closely matches the given list. I'd like to use a least squares fit as the distance metric.

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

So in this case, it would return C as the closest match to given.

I thought I could incorporate something like:

match = [min([val[idx] for val in [A,B,C]], key=lambda x: abs(x-given[idx])) for idx in range(len(given))]

But that only returns the closest value for each list element. I'm not sure how to then identify list C as the closest point-by-point match.

Also, if the lists are different lengths, I really don't know what to do if I'm not comparing them index by index. For example:

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 2, 5.1, 3, 6.8, 7.1, 8.2, 9]
B = [-0.1, 0.9, 2.1, 3.1, 3.9]
C = [-1.7, -1, 0, 1.1, 2, 2.9, 4, 5.1, 6, 7.1, 8]

would still return C as the closest match.

I'm also using Numpy but haven't found anything useful. Any help would be greatly appreciated!

Upvotes: 0

Views: 778

Answers (2)

Fred
Fred

Reputation: 26

You can use the sum of the squared errors. I made a quick example:

from copy import copy

def squaredError(a, b):
    r = copy(a)

    for i in range(len(a)):
        r[i] -= b[i]
        r[i] *= r[i]

    return sum(r)

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

print squaredError(given, A)
print squaredError(given, B)
print squaredError(given, C)

match = min(map(lambda x: (squaredError(given, x), x), [A,B,C]))[1]
print match

Upvotes: 1

mgilson
mgilson

Reputation: 309831

The pure python solution isn't most efficient, but here's one implementation using least squares for the distance metric.

def distance(x,y):
    return sum( (a-b)**2 for a,b in zip(x,y) )

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

min((A,B,C),key=lambda x:distance(x,given))

Assuming np.ndarrays of the same size, distance could be written as:

def distance(x,y):
    return ((x-y)**2).sum()

Upvotes: 1

Related Questions