MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

*Vectorized* way to find indices of minimums for each column (excluding all already found indices)

I have the following square DataFrame:

In [104]: d
Out[104]:
           a          b          c          d          e
a        inf   5.909091   8.636364   7.272727   4.454545
b   7.222222        inf   8.666667   7.666667   1.777778
c  15.833333  13.000000        inf   9.166667  14.666667
d   4.444444   3.833333   3.055556        inf   4.833333
e  24.500000   8.000000  44.000000  43.500000        inf

this is modified distance matrix, representing pairwise distance between objects ['a','b','c','d','e'], where each row is divided by a coefficient (weight) and all diagonal elements artificially set to np.inf.

How may I get a list/vector of indices like as follows in an efficient (vectorized) way:

d   # index of minimal element in the column `a`
a   # index of minimal element in the column `b` (excluding already found indices: [d]) 
b   # index of minimal element in the column `c` (excluding already found indices: [d,a]) 
c   # index of minimal element in the column `d` (excluding already found indices: [d,a,b]) 

I.e. in the first column we had found index d, so when we search for a minimum in the second column we are excluding row with index d (found previously in the first column) - this would be a.

When we are looking for the minimum in the third column we are excluding rows with indices found previously (['d','a']) - this would be b.

When we are looking for the minimum in the fourth column we are excluding rows with indices found previously (['d','a','b']) - this would be c.

I don't need diagonal (inf) elements, so the resulting list/vector will contain d.shape[0] - 1 elements.


I.e. the resulting list will look like: ['d','a','b','c'] or in case of Numpy solution the corresponding numerical indices: [3,0,1,2]

It's not a problem to do it using slow for loop solution, but I can't wrap my head around a vectorized (fast) solution...

Upvotes: 6

Views: 132

Answers (2)

jpp
jpp

Reputation: 164623

A loop is the only solution I can see here.

But you can use numpy + numba to optimise.

from numba import jit

@jit(nopython=True)
def get_min_lookback(A, res):
    for i in range(A.shape[1]):
        res[i] = np.argmin(A[:, i])
        A[res[i], :] = np.inf
    return res

arr = df.values

get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))

# array([3, 0, 1, 2, 0])

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Here is my solution, which I'm sure is not the best one:

resulting list:

res = []

main function, that will search for a minimum in a column, excluding previously found indices and adding found index to res:

def f(col):
    ret = col.loc[~col.index.isin(res)].idxmin()
    if ret not in res:
        res.append(ret)

apply function to each column:

_ = d.apply(f)

result:

In [55]: res
Out[55]: ['d', 'a', 'b', 'c', 'e']

excluding last element:

In [56]: res[:-1]
Out[56]: ['d', 'a', 'b', 'c']

Upvotes: 1

Related Questions