*Vectorized* way to find indices of minimums for each column (excluding all already found indices)

Question

I have the following square DataFrame:

In [104]: d
Out[104]:
           a          b          c          d          e
a        inf   5.909091   8.636364   7.272727   4.454545
b   7.222222        inf   8.666667   7.666667   1.777778
c  15.833333  13.000000        inf   9.166667  14.666667
d   4.444444   3.833333   3.055556        inf   4.833333
e  24.500000   8.000000  44.000000  43.500000        inf

this is modified distance matrix, representing pairwise distance between objects ['a','b','c','d','e'], where each row is divided by a coefficient (weight) and all diagonal elements artificially set to np.inf.

How may I get a list/vector of indices like as follows in an efficient (vectorized) way:

d   # index of minimal element in the column `a`
a   # index of minimal element in the column `b` (excluding already found indices: [d]) 
b   # index of minimal element in the column `c` (excluding already found indices: [d,a]) 
c   # index of minimal element in the column `d` (excluding already found indices: [d,a,b])

I.e. in the first column we had found index d, so when we search for a minimum in the second column we are excluding row with index d (found previously in the first column) - this would be a.

When we are looking for the minimum in the third column we are excluding rows with indices found previously (['d','a']) - this would be b.

When we are looking for the minimum in the fourth column we are excluding rows with indices found previously (['d','a','b']) - this would be c.

I don't need diagonal (inf) elements, so the resulting list/vector will contain d.shape[0] - 1 elements.

I.e. the resulting list will look like: ['d','a','b','c'] or in case of Numpy solution the corresponding numerical indices: [3,0,1,2]

It's not a problem to do it using slow for loop solution, but I can't wrap my head around a vectorized (fast) solution...

jpp · Accepted Answer

A loop is the only solution I can see here.

But you can use numpy + numba to optimise.

from numba import jit

@jit(nopython=True)
def get_min_lookback(A, res):
    for i in range(A.shape[1]):
        res[i] = np.argmin(A[:, i])
        A[res[i], :] = np.inf
    return res

arr = df.values

get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))

# array([3, 0, 1, 2, 0])

Vectorized way to find indices of minimums for each column (excluding all already found indices)

Answers (2)

Related Questions