Vectorize Operations in Numpy for Two Dependent Arrays

Question

I have an n x n numpy array that contains all pairwise distances and another 1 x n array that contains some scoring metric.

Example:

import numpy as np
import scipy.spatial.distance

dists = scipy.spatial.distance.squareform(np.array([3.2,4.1,8.8,.6,1.5,9.,5.0,9.9,10.,1.1]))

array([[  0. ,   3.2,   4.1,   8.8,   0.6],
       [  3.2,   0. ,   1.5,   9. ,   5. ],
       [  4.1,   1.5,   0. ,   9.9,  10. ],
       [  8.8,   9. ,   9.9,   0. ,   1.1],
       [  0.6,   5. ,  10. ,   1.1,   0. ]])

score = np.array([19., 1.3, 4.8, 6.2, 5.7])

array([ 19. ,   1.3,   4.8,   6.2,   5.7])

So, note that the ith element of the score array corresponds to the ith row of the distance array.

What I need to do is vectorize this process:

For the ith value in the score array, find all other values that are larger than the ith value and note their indices
Then, in the ith row of the distance array, get all of the distances with the same indices as noted in step 1. above and return the smallest distance
In the case where the ith value in the score array is the largest, then the smallest distance is set as the largest distance found in the distance array

Here is an un-vectorized version:

n = score.shape[0]
min_dist = np.full(n, np.max(dists))
for i in range(score.shape[0]):
    inx = numpy.where(score > score[i])
    if len(inx[0]) > 0:
        min_dist[i] = np.min(dists[i, inx])

min_dist

array([ 10. ,   1.5,   4.1,   8.8,   0.6])

This works but is pretty inefficient in terms of speed and my arrays are expected to be much, much larger. I am hoping to improve the efficiency by using faster vectorized operations to achieve the same result.

Update: Based on Oliver W.'s answer, I came up with my own that doesn't require making a copy of the distance array

def new_method (dists, score):
    mask = score > score.reshape(-1,1)
    return np.ma.masked_array(dists, mask=~mask).min(axis=1).filled(dists.max())

One could in theory make it a one-liner but it's already a bit challenging to read to the untrained eye.

Vectorize Operations in Numpy for Two Dependent Arrays

Answers (1)

Related Questions