Roman Kazmin
Roman Kazmin

Reputation: 981

Find closest elements of two lists - the best way

I have two lists(or columns of dataframe - doesn't matter) of numerical values:

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]

for example. I need to correlate these lists and find pairs(indexOfElementFromL1, indexOfElementFromL2) which corresponds min diff of two elements. For example, for my example it should be: (0,1), (1,1), (2,0), (3,2), (4,4). Really what I want - find closest element L2 to each one of L1. Of course I could naive approch like:

for el1 in L1:
  for el2 in L2:
    ....

But I'd to like to see more correct solution

Upvotes: 0

Views: 2126

Answers (4)

aerobiomat
aerobiomat

Reputation: 3437

You can use brute force methods using for loops or taking advantage of broadcasting, as you can see in other answers.

The problem comes when the lists are large. If that's the case, for loops will be slow and broadcasting will take a lot of memory.

The alternative is to use a search algorithm like FLANN:

>>> import numpy as np
>>> from pyflann import FLANN

>>> L1 = np.array([[1, 0.5, 3, 7, 4.7]]).T      # Data points as rows
>>> L2 = np.array([[2, 0.4, 8, 0.3, 5]]).T

>>> idx, _ = FLANN().nn(pts=L2, qpts=L1)
>>> idx
array([1, 1, 0, 2, 4], dtype=int32)

>>> list(enumerate(idx))
[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

Or, if you want to perform multiple queries:

>>> flann = FLANN()
>>> flann.build_index(pts=L2)           # Build search index
>>> idx, _ = flann.nn_index(qpts=L1)    # Query

Upvotes: 1

user17242583
user17242583

Reputation:

You can do it really fast by converting your lists to numpy arrays and using numpy broadcasting:

a1 = np.array(L1)
a2 = np.array(L2)

indexes = abs(a1[:, None] - a2).argmin(axis=1)
out = list(enumerate(indexes))

Output:

>>> indexes
array([1, 1, 0, 2, 4])

>>> out
[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

Upvotes: 5

Shawn Ramirez
Shawn Ramirez

Reputation: 833

taken from another answer; though I do not know how to flag for such:

Find nearest value in numpy array

import numpy as np

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]

def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

for el1 in L1:
     print(find_nearest(L2, el1))

Upvotes: 2

Tobi208
Tobi208

Reputation: 1376

I don't see how an answer could be 'more correct', but this is how I'd do it. I like it more than a cramped one liner because it's still readable.

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]


def find_closest(x):
    l2_diffs = [abs(x - y) for y in L2]
    return l2_diffs.index(min(l2_diffs))


res = [(i, find_closest(x)) for i, x in enumerate(L1)]

print(res)

Output:

[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

Upvotes: 1

Related Questions