Lotzki
Lotzki

Reputation: 529

Vectorizing nearest neighbor computation

I have the following function which is returning an array calculating the nearest neighbor:

def p_batch(U,X,Y):
    return [nearest(u,X,Y) for u in U]

I would like to replace the for loop using numpy. I've been looking into numpy.vectorize() as this seems to be the right approach, but I can't get it to work. This is what I've tried so far:

def n_batch(U,X,Y):
    vbatch = np.vectorize(nearest)
    return vbatch(U,X,Y)

Can anyone give me a hint where I went wrong?

Edit:

Implementation of nearest:

def nearest(u,X,Y):
    return Y[np.argmin(np.sqrt(np.sum(np.square(np.subtract(u,X)),axis=1)))]

Function for U,X,Y (with M=20,N=100,d=50):

U = numpy.random.mtrand.RandomState(123).uniform(0,1,[M,d])
X = numpy.random.mtrand.RandomState(456).uniform(0,1,[N,d])
Y = numpy.random.mtrand.RandomState(789).randint(0,2,[N])

Upvotes: 1

Views: 405

Answers (1)

Divakar
Divakar

Reputation: 221754

Approach #1

You could use Scipy's cdist to generate all those euclidean distances and then simply use argmin and index into Y -

from scipy.spatial.distance import cdist

out = Y[cdist(U,X).argmin(1)]

Sample run -

In [76]: M,N,d = 5,6,3
    ...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
    ...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
    ...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
    ...: 

# Using a loop comprehension to verify values
In [77]: [nearest(U[i], X,Y) for i in range(len(U))]
Out[77]: [1, 0, 0, 1, 1]

In [78]: Y[cdist(U,X).argmin(1)]
Out[78]: array([1, 0, 0, 1, 1])

Approach #2

Another way with sklearn.metrics.pairwise_distances_argmin_min to give us those argmin indices directly -

from sklearn.metrics import pairwise

Y[pairwise.pairwise_distances_argmin(U,X)]

Runtime test with M=20,N=100,d=50 -

In [90]: M,N,d = 20,100,50
    ...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
    ...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
    ...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
    ...: 

Testing between cdist and pairwise_distances_argmin -

In [91]: %timeit cdist(U,X).argmin(1)
10000 loops, best of 3: 55.2 µs per loop

In [92]: %timeit pairwise.pairwise_distances_argmin(U,X)
10000 loops, best of 3: 90.6 µs per loop

Timings against loopy version -

In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop

In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop

In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop

In [96]: 298.0/55.6   # Speedup with cdist over loopy one
Out[96]: 5.359712230215827

Upvotes: 2

Related Questions